Memory & Knowledge System

UserDeveloper5 min read

Memory & Knowledge System

nyxCore implements a two-layer memory system. The first layer is explicit: you add it manually. The second layer is machine-extracted from workflow review steps and grows automatically with every run.

Layer 1: MemoryEntry (Manual Knowledge Base)

MemoryEntry is the general knowledge base. You create entries with titles, content, and tags. Content can originate from three sources:

  • "manual" — created directly in the dashboard
  • "github" — synced from .memory/letter_*.md files in a GitHub repository
  • "import" — bulk imported

MemoryEntries are full-text searchable via PostgreSQL tsvector. They're not directly injected into workflows — they're a reference library you can consult from the dashboard.

Layer 2: WorkflowInsight (Structured, Vectorized)

WorkflowInsight is the machine-extracted layer. These records are created when you approve key points from a workflow review step.

Field Type Purpose
title String Insight headline
detail Text Full description
suggestion Text? Recommended action
category String Domain area (Security, Architecture, ...)
insightType String pain_point, strength, solution, pattern, decision
severity String? blocker, high, medium, low
pairedInsightId UUID? Link to matching pain/solution
embedding vector(1536) For semantic search
searchVector tsvector For full-text search

WorkflowInsights are injected into future workflows via {{memory}} (you explicitly link them via the MemoryPicker) and {{project.wisdom}} (automatically per-project).

The Embedding Pipeline

When a WorkflowInsight is saved, it goes through an embedding pipeline:

  1. A structured text string is built from the insight's fields:
[pain_point] [Security] [high] SQL Injection in User Input:
The login form passes unsanitized user input directly to a raw SQL query.
Suggestion: Use parameterized queries or Prisma's query builder.
  1. OpenAI's text-embedding-3-small generates a 1536-dimensional vector.

  2. The vector is stored via raw SQL (Prisma doesn't support the vector type natively):

UPDATE workflow_insights
SET embedding = $1::vector
WHERE id = $2::uuid
  1. A HNSW index (m=16, ef_construction=64, vector_cosine_ops) enables efficient approximate nearest-neighbor search.

Embedding requires an OpenAI API key in the tenant's vault. If no key is configured, insights are stored without embeddings and fall back to text-only search.

Hybrid Search Algorithm

The search system combines two signals:

$$S_{hybrid} = 0.7 \cdot S_{vector} + 0.3 \cdot S_{text}$$

Where:

  • $S_{vector} = 1 - d_{cosine}(\mathbf{q}, \mathbf{v})$ — cosine similarity from pgvector
  • $S_{text} = \text{ts_rank}(\mathbf{v}_{search}, \text{websearch_to_tsquery}(\text{'english'}, q))$

The 70/30 split prioritizes semantic understanding. A query for "authentication bypass" finds insights about "broken access control" because they're semantically close — even if the exact words don't match. The 30% text weight boosts results that contain the specific terms you searched.

SQL implementation (simplified):

WITH vector_matches AS (
  SELECT id, 1 - (embedding <=> $query_vector::vector) AS vector_score
  FROM workflow_insights
  WHERE "tenantId" = $tenantId AND embedding IS NOT NULL
  ORDER BY embedding <=> $query_vector::vector
  LIMIT 50
),
text_matches AS (
  SELECT id, ts_rank("searchVector", websearch_to_tsquery('english', $query)) AS text_score
  FROM workflow_insights
  WHERE "tenantId" = $tenantId
    AND "searchVector" @@ websearch_to_tsquery('english', $query)
  LIMIT 50
)
SELECT *, COALESCE(vm.vector_score, 0) * 0.7 + COALESCE(tm.text_score, 0) * 0.3 AS score
FROM workflow_insights
...

Automatic Pain-Solution Pairing

After insights are saved, the pairInsights() function runs automatically. It pairs pain points with strengths that share the same category:

  • A pain_point in the Security category gets paired with a strength in the Security category
  • The link is bidirectional: pairedInsightId is set on both records

This creates balanced context. When a future workflow injects this insight pair, it gets both the problem and the known working solution in the same category — resisting the LLM's tendency toward one-sided analysis.

Cross-Project Pattern Detection

When a blocker or high-severity pain point is saved, the cross-project scanner fires. It runs a vector similarity search against all insights in the tenant's other projects:

SELECT id, title, 1 - (embedding <=> query::vector) AS similarity
FROM workflow_insights
WHERE "projectId" = otherProject_id
  AND embedding IS NOT NULL
  AND 1 - (embedding <=> query::vector) >= 0.65

Matches above the 0.65 threshold trigger automatic action point creation in the matching project. The action point records the cross-project detection, the similarity score, and both projects — so you know which project the pattern originated from.

A deduplication guard prevents duplicate action points if the same scan runs again.

Injecting Memory into Workflows

Via {{memory}}: You link specific insights to a workflow using the MemoryPicker. At runtime, loadMemoryContent() formats the linked insights into a markdown block injected into {{memory}}. You control exactly which past findings appear.

Via {{project.wisdom}}: When a workflow is linked to a project, all insights accumulated for that project are automatically formatted and injected. No manual selection needed — the entire project knowledge base is available.

Both injection paths pass through sanitizeContextContent() which escapes {{ sequences to prevent recursive template resolution.

GitHub Memory Sync

If you store session checkpoint files (.memory/letter_YYYYMMDD_XXXX.md) in a GitHub repository, the sync pipeline imports them as MemoryEntries. Each file is parsed for:

  • Title: from the first # heading, YAML title: field, or filename
  • Tags: from YAML tags: [...] or default ["memory", "imported"]
  • Date: from the letter_YYYYMMDD filename pattern

The sourceRef field records the canonical path (github://owner/repo/.memory/letter_YYYYMMDD_XXXX.md) for deduplication — the same file is never imported twice for the same user.