Kodecraft developers regularly post tech and AI links into Mattermost as a way to share discoveries with the team. The links live in a communication medium, not a knowledge base.
Past links become practically unfindable — they scroll out of view, mix with unrelated chat, and cannot be referenced systematically. A link posted six months ago that would solve today's problem is effectively lost.
Volume is high enough that even recently-shared links get missed. There is no filter for what matters to Kodecraft, no record of past evaluations, no signal when a tool is superseded.
Periodic poll fetches new posts from scoped Mattermost channels. URLs are canonicalized and deduped, then a heuristic filter rejects obvious non-tech links before any LLM call.
A cheap classifier (Stage 1) labels surviving links and narrows project context. A capable evaluator (Stage 2) produces the structured record that drives writes to Outline (full record) and the Mattermost Hub (top-5 ranking).
Developers post AI tools, libraries, frameworks, and articles in Mattermost as fast as they encounter them. This is the input surface, and it already exists.
The pipeline does not change how people share. It changes what happens after a link is posted: every URL is captured, canonicalized, deduped, and routed through evaluation — without anyone needing to do anything new.
Every evaluated link lives in Outline as a structured entry: classification, relevance score, pros/cons, project routing, freshness state. Multiple docs split the surface — Workflow, per-project Dependencies, General, and an append-only Evaluation History.
This is the source of truth. Search, audit, freshness — all here.
One pinned post per category in #tech-enrichment-hub: Workflow, Dependency (per project), General. Each post is edited in place on every dirty-flag dispatch — no scrollback, no growing thread.
Outline = full record. Mattermost Hub = "what should I look at next?"
Each new entry gets a threaded reply in #tech-links with two seed reactions: 🟢 ("I used this and it worked") and 🔴 ("I tried it; it didn't fit"). No third "haven't used" — that creates social pressure and noisy signal.
Non-engagement is computed, not declared. Reactions are retractable; current state is authoritative.
URLs are canonicalized — tracking parameters stripped, redirects resolved, GitHub URLs collapsed to owner/repo. Duplicates against the link registry are dropped here.
A small YAML rule set (config/heuristic_rules.yaml) rejects domains and URL patterns that aren't worth evaluating: pure social media, internal Kodecraft URLs, unrelated content.
A small model (gpt-oss-120) sees the link, fetched content, the project taxonomy, and all per-project PRDs. It labels the link as workflow / dependency / both / general / neither.
Output also carries candidate_slugs — projects most likely affected — which Stage 2 will use to narrow its expensive context. general with medium+ confidence short-circuits Stage 2 entirely.
A capable model (gpt-oss-120) consumes per-project PRD + dependency context, narrowed by Stage 1's candidate_slugs within a 4000-token budget per axis.
Output is a strict JSON schema. matches_existing filters per project but preserves the global adopted signal — already-adopted excludes that project, not the link. Fetched content is treated as untrusted data, not instructions.
Langfuse captured 950 observations across 575 traces (Apr 26 – May 8) but model pricing wasn't configured — cost_details are null for all calls.
Figures are estimated at $0.15 / 1M input · $0.60 / 1M output applied to observed token counts from 51 clean two-stage runs.
| State | Meaning |
|---|---|
| 🟢 active | Current recommendation or informational entry. |
| ✅ adopted | Team signals indicate Kodecraft is using it. |
| 🔴 rejected | Team or pipeline determined it's not useful. |
| ⚫ obsolete | Tool itself is no longer viable (archived, deprecated). |
| 🟡 superseded | Another candidate replaced it as the recommendation. |
| State | Meaning |
|---|---|
| ✓ fresh | Latest evaluation is recent enough to trust. |
| ◷ due_for_review | Still usable, but the check window has expired. |
| ⏳ awaiting_evaluation | Queued for a re-check. |
| ⚠ reevaluation_failed | Last re-check failed; prior analysis remains visible. |
| Record type | Review cadence |
|---|---|
| active recommendation | 30 days |
| alternative candidate | 14 days |
| adopted | 60 days |
| general current-state item | 60 days |
| rejected | no scheduled review |
| obsolete or superseded | no scheduled review unless manually targeted |
Failed re-evaluations preserve the prior payload; obsolete requires evidence-backed reason; superseded requires a successor reference. Age alone never makes a tool obsolete.
| Phase | Trigger | System action |
|---|---|---|
| Newly surfaced | Entry just created | Threaded reply with 🟢 / 🔴 seed reactions |
| Active | Entry visible, accumulating signals | Counts tracked; no notification |
| Low-engagement | Age > 6 weeks, low reaction rate | Follow-up prompt in channel |
| Persistent silence | Long quiet period after prompts | Marked "awaiting evaluation"; remains visible |
| Adopted / Rejected | Sufficient 🟢 / 🔴 signals | Disposition transition; appended to history |
Today's retrieval is direct: PRD bodies and dep doc sections are loaded from Outline and injected into prompts within token budgets. This is simple and sufficient for the current corpus.
RAG enters when two signals appear: the LLM misses semantic matches (adjacent concepts not surfacing because queries target specific names), or token cost from full-document loads becomes a material fraction of evaluation spend. Either signal earns RAG its place — neither earns it preemptively.
Parallel future input: claude-mem workflow telemetry, captured from real Claude Code / OpenCode sessions, will feed workflow-track context once that project ships.
Lance Alexander Ventura · AI Engineer Intern · lance@kodecraft.dev