CKB Integration

Developer9 min read

CKB Integration into nyxCore: A Structural Code Analysis Backend for LLM-Augmented Development Platforms

Date: 2026-03-19 Authors: Oliver Baer, Claude Version: 1.0 Classification: Technical Architecture Document


Abstract

This document presents the design and implementation of CKB (Code Knowledge Backend) integration into nyxCore, a multi-tenant SaaS platform for LLM-powered software development workflows. The integration addresses a critical dependency on external LLM APIs for code analysis by introducing a self-hosted, LLM-independent structural analysis engine. CKB runs as a Docker worker container executing CLI commands via child_process.execFile against a shared volume of cloned repositories. Analysis results are cached in PostgreSQL as JSON and surfaced through three channels: a dedicated Code Intelligence UI, a {{ckb}} template variable for workflow prompt injection, and 13 tRPC API procedures. The architecture employs a worker pattern (rather than HTTP server mode) to support multi-repository concurrent analysis, with security enforced through path traversal prevention, Docker socket read-only mounting, transient token passthrough, and PostgreSQL Row-Level Security tenant isolation.


1. Introduction

1.1 Problem Statement

nyxCore's code analysis capabilities depend on external LLM APIs (Anthropic, OpenAI, Google) for pattern recognition, code review, and architectural reasoning. This dependency introduces three categories of operational risk:

  1. Cost: Each analysis consumes 10,000-100,000 tokens depending on repository size, translating to $0.05-0.50 per invocation at current API pricing. At scale (100+ analyses per day), monthly costs reach $150-1,500 for a capability that does not inherently require natural language understanding.

  2. Availability: LLM API rate limits (HTTP 429 responses) and credit exhaustion create downtime windows where code analysis is entirely unavailable. This was observed in production when OpenAI credit balances were exhausted, disabling all embedding-dependent features simultaneously.

  3. Latency: LLM-based code analysis requires 5-30 seconds per invocation due to network round-trips, token generation time, and prompt processing. Structural queries (e.g., "which files have the highest cyclomatic complexity?") do not benefit from language model reasoning and are unnecessarily slow when routed through LLM inference.

1.2 Solution Overview

CKB (Code Knowledge Backend) is a purpose-built structural code analysis tool that provides architecture detection, risk hotspot identification, security auditing, dead code detection, coupling analysis, complexity metrics, ownership tracking, and PR impact analysis — entirely without LLM calls. It operates on locally cloned repositories using static analysis, git log mining, and graph-based dependency resolution.

The integration runs CKB as a sidecar Docker container alongside the nyxCore application stack. The nyxCore application server communicates with CKB by executing CLI commands inside the container via the Docker socket, parsing JSON output, and caching results in the existing PostgreSQL database. This approach achieves zero-cost analysis (compute only), sub-second cached response times, and complete independence from external API availability.


2. Architecture

2.1 Worker Pattern

CKB provides two operational modes: an HTTP Index-Server for SCIP-based symbol/reference queries, and a CLI interface for structural analysis. The analysis endpoints (arch, hotspots, audit, coupling, complexity, callgraph, dead-code, ownership, impact, pr-summary, search) operate exclusively through the CLI against a local repository checkout. The Index-Server mode does not expose these analysis capabilities.

This architectural constraint dictates the worker pattern: CKB runs as a long-lived container with sleep infinity as its entrypoint (no server process), and the nyxCore application executes commands inside the container via docker exec. This design has three advantages over an HTTP server approach:

  1. Multi-repository support: Each docker exec call specifies the target repository path, enabling concurrent analysis across all tenant repositories without server reconfiguration.
  2. No connection management: There are no HTTP connections to maintain, no connection pools to configure, and no health-check endpoints to monitor beyond the binary availability check (ckb version).
  3. Process isolation: Each analysis runs as an independent process within the container. A crash or timeout in one analysis does not affect others.

The trade-off is the overhead of process spawning per command (approximately 50-100ms for docker exec initialization), which is negligible compared to the analysis computation time.

2.2 Data Flow

The complete data flow from repository linking to workflow injection follows this path:

┌──────────────────────────────────────────────────────────────────────┐
│                        nyxCore Application                           │
│                                                                      │
│  ┌─────────────┐    ┌──────────────┐    ┌─────────────────────────┐  │
│  │  tRPC Router │───>│  CKB Client  │───>│  docker exec ckb <cmd> │  │
│  │  (13 procs)  │    │  (execFile)  │    │  --format json         │  │
│  └─────┬───────┘    └──────┬───────┘    └────────────┬────────────┘  │
│        │                   │                          │               │
│        │            ┌──────▼───────┐          ┌──────▼──────────┐    │
│        │            │  JSON Parse  │          │  CKB Container  │    │
│        │            │  + Validate  │          │  (worker mode)  │    │
│        │            └──────┬───────┘          │                 │    │
│        │                   │                  │  /data/repos/   │    │
│        │            ┌──────▼───────┐          │   ├─ <projA>/   │    │
│        │            │  PostgreSQL  │          │   ├─ <projB>/   │    │
│        │            │  (analysis   │          │   └─ <projN>/   │    │
│        │            │   Cache)     │          └─────────────────┘    │
│        │            └──────┬───────┘                                  │
│        │                   │                                          │
│  ┌─────▼───────┐    ┌─────▼────────┐    ┌──────────────────────┐    │
│  │  UI Cards   │    │ Content Loader│───>│  Redis Cache         │    │
│  │  (React)    │    │ ({{ckb}})    │    │  (1h TTL, 8K chars)  │    │
│  └─────────────┘    └──────────────┘    └──────────────────────┘    │
└──────────────────────────────────────────────────────────────────────┘

Phase 1 — Clone and Index:

  1. Project links a GitHub repository (owner/repo stored on Project model).
  2. isAvailable() checks CKB container responsiveness via docker exec ckb ckb version.
  3. cloneRepo() executes git clone --depth 1 inside the container into /data/repos/<projectId>.
  4. runFullAnalysis() sequentially invokes arch, hotspots, audit, and dead-code commands.
  5. Results are stored as a JSON blob in ProjectCkbIndex.analysisCache.
  6. Status transitions: pending -> cloning -> indexing -> ready (or failed).

Phase 2 — Serve:

  • UI: tRPC queries read from analysisCache (overview cards) or invoke CKB live (detail queries).
  • Workflows: loadCkbContent() reads from Redis cache (or DB fallback), formats as markdown, injects into {{ckb}} template variable.
  • Refresh: Manual re-analyze or webhook triggers git pull + runFullAnalysis() + cache invalidation.

2.3 Security Model

The CKB integration introduces four security boundaries:

Path Traversal Prevention. All repository path arguments are validated against a strict prefix (/data/repos/) and rejected if they contain .. sequences. The deleteRepo() function additionally enforces that the path is exactly one level deep under the base directory, preventing wildcard deletions.

Docker Socket Read-Only Mount. The Docker socket is mounted as read-only (/var/run/docker.sock:ro) in the nyxCore application container. This prevents the application from creating, starting, or stopping containers — it can only exec into the pre-existing CKB container.

Transient Token Passthrough. For repositories requiring authentication, a short-lived GitHub token is passed as an environment variable to the docker exec command rather than being stored in the CKB container or on disk. The token exists only for the duration of the clone/pull operation.

PostgreSQL RLS Tenant Isolation. The ProjectCkbIndex table is covered by Row-Level Security policies. Each tenant's analysis cache is completely isolated from other tenants' data at the database layer.


3. Analysis Commands

CKB exposes 11 analysis commands via its CLI:

Command Description Cached?
arch Architecture detection, module boundaries, dependency graph Yes
hotspots Risk hotspots by churn rate + complexity Yes
audit Security audit: hardcoded secrets, unsafe patterns Yes
dead-code Unreachable functions, unused exports Yes
coupling Coupling analysis between modules Live
complexity Cyclomatic complexity per file/function Live
callgraph Function call graph for a specific entrypoint Live
ownership Git blame-based ownership per file Live
impact Impact analysis for a changed file Live
pr-summary Summary of changes in a PR diff Live
search Symbol/identifier search across the codebase Live

"Cached" commands are run during runFullAnalysis() and stored in ProjectCkbIndex.analysisCache. "Live" commands are invoked on-demand by specific tRPC queries.


4. Template Variable Integration

The {{ckb}} template variable injects a formatted markdown summary of the CKB analysis into workflow prompts. This enables workflows to reference:

  • Architecture overview: module structure, dependency counts, entry points
  • Risk hotspots: top-N files by combined churn + complexity score
  • Security findings: hardcoded credentials, unsafe function calls
  • Dead code inventory: unused exports, unreachable code paths

Loading logic (loadCkbContent()):

  1. Check Redis cache (1h TTL, 8K character limit)
  2. On cache miss, query ProjectCkbIndex.analysisCache from PostgreSQL
  3. Format JSON analysis results as markdown sections
  4. Store formatted markdown in Redis
  5. Return to template resolver

Cache invalidation triggers:

  • Manual re-analyze via the Code Intelligence UI
  • Webhook from GitHub (push events on the tracked branch)
  • git pull during a sync cycle that detects changed files

5. tRPC Procedures

The ckb router exposes 13 procedures:

Procedure Type Description
status query Check CKB container availability
getIndex query Get cached analysis for a project
startClone mutation Clone repo + run full analysis
reanalyze mutation Re-run analysis on existing clone
getArchitecture query Architecture graph with live data
getHotspots query Risk hotspot rankings
getAudit query Security audit findings
getDeadCode query Dead code inventory
getCoupling query Module coupling metrics
getComplexity query File/function complexity
getOwnership query File ownership by contributor
searchSymbol query Symbol search across codebase
deleteIndex mutation Remove clone + cached analysis

6. Docker Compose Configuration

services:
  ckb:
    image: ghcr.io/simplyliz/ckb:latest
    container_name: nyxcore-ckb
    entrypoint: ["sleep", "infinity"]
    volumes:
      - ckb_repos:/data/repos
    networks:
      - nyxcore-net
    restart: unless-stopped

  app:
    # ...
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ckb_repos:/data/repos
    environment:
      CKB_CONTAINER_NAME: nyxcore-ckb
      CKB_REPOS_PATH: /data/repos

volumes:
  ckb_repos:

The shared ckb_repos volume gives both the app container and the ckb container access to the same repository directory tree. The app writes clone paths; CKB reads them during analysis.