CKB Integration into nyxCore: A Structural Code Analysis Backend for LLM-Augmented Development Platforms

Date: 2026-03-19 Authors: Oliver Baer, Claude Version: 1.0 Classification: Technical Architecture Document

Abstract

This document presents the design and implementation of CKB (Code Knowledge Backend) integration into nyxCore, a multi-tenant SaaS platform for LLM-powered software development workflows. The integration addresses a critical dependency on external LLM APIs for code analysis by introducing a self-hosted, LLM-independent structural analysis engine. CKB runs as a Docker worker container executing CLI commands via child_process.execFile against a shared volume of cloned repositories. Analysis results are cached in PostgreSQL as JSON and surfaced through three channels: a dedicated Code Intelligence UI, a {{ckb}} template variable for workflow prompt injection, and 13 tRPC API procedures. The architecture employs a worker pattern (rather than HTTP server mode) to support multi-repository concurrent analysis, with security enforced through path traversal prevention, Docker socket read-only mounting, transient token passthrough, and PostgreSQL Row-Level Security tenant isolation.

1. Introduction

1.1 Problem Statement

nyxCore's code analysis capabilities depend on external LLM APIs (Anthropic, OpenAI, Google) for pattern recognition, code review, and architectural reasoning. This dependency introduces three categories of operational risk:

Cost: Each analysis consumes 10,000-100,000 tokens depending on repository size, translating to $0.05-0.50 per invocation at current API pricing. At scale (100+ analyses per day), monthly costs reach $150-1,500 for a capability that does not inherently require natural language understanding.
Availability: LLM API rate limits (HTTP 429 responses) and credit exhaustion create downtime windows where code analysis is entirely unavailable. This was observed in production when OpenAI credit balances were exhausted, disabling all embedding-dependent features simultaneously.
Latency: LLM-based code analysis requires 5-30 seconds per invocation due to network round-trips, token generation time, and prompt processing. Structural queries (e.g., "which files have the highest cyclomatic complexity?") do not benefit from language model reasoning and are unnecessarily slow when routed through LLM inference.

1.2 Solution Overview

CKB (Code Knowledge Backend) is a purpose-built structural code analysis tool that provides architecture detection, risk hotspot identification, security auditing, dead code detection, coupling analysis, complexity metrics, ownership tracking, and PR impact analysis — entirely without LLM calls. It operates on locally cloned repositories using static analysis, git log mining, and graph-based dependency resolution.

The integration runs CKB as a sidecar Docker container alongside the nyxCore application stack. The nyxCore application server communicates with CKB by executing CLI commands inside the container via the Docker socket, parsing JSON output, and caching results in the existing PostgreSQL database. This approach achieves zero-cost analysis (compute only), sub-second cached response times, and complete independence from external API availability.

2. Architecture

2.1 Worker Pattern

CKB provides two operational modes: an HTTP Index-Server for SCIP-based symbol/reference queries, and a CLI interface for structural analysis. The analysis endpoints (arch, hotspots, audit, coupling, complexity, callgraph, dead-code, ownership, impact, pr-summary, search) operate exclusively through the CLI against a local repository checkout. The Index-Server mode does not expose these analysis capabilities.

This architectural constraint dictates the worker pattern: CKB runs as a long-lived container with sleep infinity as its entrypoint (no server process), and the nyxCore application executes commands inside the container via docker exec. This design has three advantages over an HTTP server approach:

Multi-repository support: Each docker exec call specifies the target repository path, enabling concurrent analysis across all tenant repositories without server reconfiguration.
No connection management: There are no HTTP connections to maintain, no connection pools to configure, and no health-check endpoints to monitor beyond the binary availability check (ckb version).
Process isolation: Each analysis runs as an independent process within the container. A crash or timeout in one analysis does not affect others.

The trade-off is the overhead of process spawning per command (approximately 50-100ms for docker exec initialization), which is negligible compared to the analysis computation time.

2.2 Data Flow

The complete data flow from repository linking to workflow injection follows this path:

┌──────────────────────────────────────────────────────────────────────┐
│                        nyxCore Application                           │
│                                                                      │
│  ┌─────────────┐    ┌──────────────┐    ┌─────────────────────────┐  │
│  │  tRPC Router │───>│  CKB Client  │───>│  docker exec ckb <cmd> │  │
│  │  (13 procs)  │    │  (execFile)  │    │  --format json         │  │
│  └─────┬───────┘    └──────┬───────┘    └────────────┬────────────┘  │
│        │                   │                          │               │
│        │            ┌──────▼───────┐          ┌──────▼──────────┐    │
│        │            │  JSON Parse  │          │  CKB Container  │    │
│        │            │  + Validate  │          │  (worker mode)  │    │
│        │            └──────┬───────┘          │                 │    │
│        │                   │                  │  /data/repos/   │    │
│        │            ┌──────▼───────┐          │   ├─ <projA>/   │    │
│        │            │  PostgreSQL  │          │   ├─ <projB>/   │    │
│        │            │  (analysis   │          │   └─ <projN>/   │    │
│        │            │   Cache)     │          └─────────────────┘    │
│        │            └──────┬───────┘                                  │
│        │                   │                                          │
│  ┌─────▼───────┐    ┌─────▼────────┐    ┌──────────────────────┐    │
│  │  UI Cards   │    │ Content Loader│───>│  Redis Cache         │    │
│  │  (React)    │    │ ({{ckb}})    │    │  (1h TTL, 8K chars)  │    │
│  └─────────────┘    └──────────────┘    └──────────────────────┘    │
└──────────────────────────────────────────────────────────────────────┘

Phase 1 — Clone and Index:

Project links a GitHub repository (owner/repo stored on Project model).
isAvailable() checks CKB container responsiveness via docker exec ckb ckb version.
cloneRepo() executes git clone --depth 1 inside the container into /data/repos/<projectId>.
runFullAnalysis() sequentially invokes arch, hotspots, audit, and dead-code commands.
Results are stored as a JSON blob in ProjectCkbIndex.analysisCache.
Status transitions: pending -> cloning -> indexing -> ready (or failed).

Phase 2 — Serve:

UI: tRPC queries read from analysisCache (overview cards) or invoke CKB live (detail queries).
Workflows: loadCkbContent() reads from Redis cache (or DB fallback), formats as markdown, injects into {{ckb}} template variable.
Refresh: Manual re-analyze or webhook triggers git pull + runFullAnalysis() + cache invalidation.

2.3 Security Model

The CKB integration introduces four security boundaries:

Path Traversal Prevention. All repository path arguments are validated against a strict prefix (/data/repos/) and rejected if they contain .. sequences. The deleteRepo() function additionally enforces that the path is exactly one level deep under the base directory, preventing wildcard deletions.

Docker Socket Read-Only Mount. The Docker socket is mounted as read-only (/var/run/docker.sock:ro) in the nyxCore application container. This prevents the application from creating, starting, or stopping containers — it can only exec into the pre-existing CKB container.

Transient Token Passthrough. For repositories requiring authentication, a short-lived GitHub token is passed as an environment variable to the docker exec command rather than being stored in the CKB container or on disk. The token exists only for the duration of the clone/pull operation.

PostgreSQL RLS Tenant Isolation. The ProjectCkbIndex table is covered by Row-Level Security policies. Each tenant's analysis cache is completely isolated from other tenants' data at the database layer.

3. Analysis Commands

CKB exposes 11 analysis commands via its CLI:

Command	Description	Cached?
`arch`	Architecture detection, module boundaries, dependency graph	Yes
`hotspots`	Risk hotspots by churn rate + complexity	Yes
`audit`	Security audit: hardcoded secrets, unsafe patterns	Yes
`dead-code`	Unreachable functions, unused exports	Yes
`coupling`	Coupling analysis between modules	Live
`complexity`	Cyclomatic complexity per file/function	Live
`callgraph`	Function call graph for a specific entrypoint	Live
`ownership`	Git blame-based ownership per file	Live
`impact`	Impact analysis for a changed file	Live
`pr-summary`	Summary of changes in a PR diff	Live
`search`	Symbol/identifier search across the codebase	Live

"Cached" commands are run during runFullAnalysis() and stored in ProjectCkbIndex.analysisCache. "Live" commands are invoked on-demand by specific tRPC queries.

4. Template Variable Integration

The {{ckb}} template variable injects a formatted markdown summary of the CKB analysis into workflow prompts. This enables workflows to reference:

Architecture overview: module structure, dependency counts, entry points
Risk hotspots: top-N files by combined churn + complexity score
Security findings: hardcoded credentials, unsafe function calls
Dead code inventory: unused exports, unreachable code paths

Loading logic (loadCkbContent()):

Check Redis cache (1h TTL, 8K character limit)
On cache miss, query ProjectCkbIndex.analysisCache from PostgreSQL
Format JSON analysis results as markdown sections
Store formatted markdown in Redis
Return to template resolver

Cache invalidation triggers:

Manual re-analyze via the Code Intelligence UI
Webhook from GitHub (push events on the tracked branch)
git pull during a sync cycle that detects changed files

5. tRPC Procedures

The ckb router exposes 13 procedures:

Procedure	Type	Description
`status`	query	Check CKB container availability
`getIndex`	query	Get cached analysis for a project
`startClone`	mutation	Clone repo + run full analysis
`reanalyze`	mutation	Re-run analysis on existing clone
`getArchitecture`	query	Architecture graph with live data
`getHotspots`	query	Risk hotspot rankings
`getAudit`	query	Security audit findings
`getDeadCode`	query	Dead code inventory
`getCoupling`	query	Module coupling metrics
`getComplexity`	query	File/function complexity
`getOwnership`	query	File ownership by contributor
`searchSymbol`	query	Symbol search across codebase
`deleteIndex`	mutation	Remove clone + cached analysis

6. Docker Compose Configuration

services:
  ckb:
    image: ghcr.io/simplyliz/ckb:latest
    container_name: nyxcore-ckb
    entrypoint: ["sleep", "infinity"]
    volumes:
      - ckb_repos:/data/repos
    networks:
      - nyxcore-net
    restart: unless-stopped

  app:
    # ...
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ckb_repos:/data/repos
    environment:
      CKB_CONTAINER_NAME: nyxcore-ckb
      CKB_REPOS_PATH: /data/repos

volumes:
  ckb_repos:

The shared ckb_repos volume gives both the app container and the ckb container access to the same repository directory tree. The app writes clone paths; CKB reads them during analysis.