Memory & Knowledge System
Memory & Knowledge System
nyxCore implements a two-layer memory system. The first layer is explicit: you add it manually. The second layer is machine-extracted from workflow review steps and grows automatically with every run.
Layer 1: MemoryEntry (Manual Knowledge Base)
MemoryEntry is the general knowledge base. You create entries with titles, content, and tags. Content can originate from three sources:
"manual"— created directly in the dashboard"github"— synced from.memory/letter_*.mdfiles in a GitHub repository"import"— bulk imported
MemoryEntries are full-text searchable via PostgreSQL tsvector. They're not directly injected into workflows — they're a reference library you can consult from the dashboard.
Layer 2: WorkflowInsight (Structured, Vectorized)
WorkflowInsight is the machine-extracted layer. These records are created when you approve key points from a workflow review step.
| Field | Type | Purpose |
|---|---|---|
title |
String | Insight headline |
detail |
Text | Full description |
suggestion |
Text? | Recommended action |
category |
String | Domain area (Security, Architecture, ...) |
insightType |
String | pain_point, strength, solution, pattern, decision |
severity |
String? | blocker, high, medium, low |
pairedInsightId |
UUID? | Link to matching pain/solution |
embedding |
vector(1536) | For semantic search |
searchVector |
tsvector | For full-text search |
WorkflowInsights are injected into future workflows via {{memory}} (you explicitly link them via the MemoryPicker) and {{project.wisdom}} (automatically per-project).
The Embedding Pipeline
When a WorkflowInsight is saved, it goes through an embedding pipeline:
- A structured text string is built from the insight's fields:
[pain_point] [Security] [high] SQL Injection in User Input:
The login form passes unsanitized user input directly to a raw SQL query.
Suggestion: Use parameterized queries or Prisma's query builder.
-
OpenAI's
text-embedding-3-smallgenerates a 1536-dimensional vector. -
The vector is stored via raw SQL (Prisma doesn't support the
vectortype natively):
UPDATE workflow_insights
SET embedding = $1::vector
WHERE id = $2::uuid
- A HNSW index (
m=16, ef_construction=64, vector_cosine_ops) enables efficient approximate nearest-neighbor search.
Embedding requires an OpenAI API key in the tenant's vault. If no key is configured, insights are stored without embeddings and fall back to text-only search.
Hybrid Search Algorithm
The search system combines two signals:
$$S_{hybrid} = 0.7 \cdot S_{vector} + 0.3 \cdot S_{text}$$
Where:
- $S_{vector} = 1 - d_{cosine}(\mathbf{q}, \mathbf{v})$ — cosine similarity from pgvector
- $S_{text} = \text{ts_rank}(\mathbf{v}_{search}, \text{websearch_to_tsquery}(\text{'english'}, q))$
The 70/30 split prioritizes semantic understanding. A query for "authentication bypass" finds insights about "broken access control" because they're semantically close — even if the exact words don't match. The 30% text weight boosts results that contain the specific terms you searched.
SQL implementation (simplified):
WITH vector_matches AS (
SELECT id, 1 - (embedding <=> $query_vector::vector) AS vector_score
FROM workflow_insights
WHERE "tenantId" = $tenantId AND embedding IS NOT NULL
ORDER BY embedding <=> $query_vector::vector
LIMIT 50
),
text_matches AS (
SELECT id, ts_rank("searchVector", websearch_to_tsquery('english', $query)) AS text_score
FROM workflow_insights
WHERE "tenantId" = $tenantId
AND "searchVector" @@ websearch_to_tsquery('english', $query)
LIMIT 50
)
SELECT *, COALESCE(vm.vector_score, 0) * 0.7 + COALESCE(tm.text_score, 0) * 0.3 AS score
FROM workflow_insights
...
Automatic Pain-Solution Pairing
After insights are saved, the pairInsights() function runs automatically. It pairs pain points with strengths that share the same category:
- A
pain_pointin the Security category gets paired with astrengthin the Security category - The link is bidirectional:
pairedInsightIdis set on both records
This creates balanced context. When a future workflow injects this insight pair, it gets both the problem and the known working solution in the same category — resisting the LLM's tendency toward one-sided analysis.
Cross-Project Pattern Detection
When a blocker or high-severity pain point is saved, the cross-project scanner fires. It runs a vector similarity search against all insights in the tenant's other projects:
SELECT id, title, 1 - (embedding <=> query::vector) AS similarity
FROM workflow_insights
WHERE "projectId" = otherProject_id
AND embedding IS NOT NULL
AND 1 - (embedding <=> query::vector) >= 0.65
Matches above the 0.65 threshold trigger automatic action point creation in the matching project. The action point records the cross-project detection, the similarity score, and both projects — so you know which project the pattern originated from.
A deduplication guard prevents duplicate action points if the same scan runs again.
Injecting Memory into Workflows
Via {{memory}}: You link specific insights to a workflow using the MemoryPicker. At runtime, loadMemoryContent() formats the linked insights into a markdown block injected into {{memory}}. You control exactly which past findings appear.
Via {{project.wisdom}}: When a workflow is linked to a project, all insights accumulated for that project are automatically formatted and injected. No manual selection needed — the entire project knowledge base is available.
Both injection paths pass through sanitizeContextContent() which escapes {{ sequences to prevent recursive template resolution.
GitHub Memory Sync
If you store session checkpoint files (.memory/letter_YYYYMMDD_XXXX.md) in a GitHub repository, the sync pipeline imports them as MemoryEntries. Each file is parsed for:
- Title: from the first
# heading, YAMLtitle:field, or filename - Tags: from YAML
tags: [...]or default["memory", "imported"] - Date: from the
letter_YYYYMMDDfilename pattern
The sourceRef field records the canonical path (github://owner/repo/.memory/letter_YYYYMMDD_XXXX.md) for deduplication — the same file is never imported twice for the same user.
