GitHub Connector

Developer7 min read

17. GitHub Connector Service

The GitHub connector (src/server/services/github-connector.ts) provides authenticated access to the GitHub REST API using the BYOK (Bring Your Own Key) pattern. It handles repository enumeration, memory file synchronization, file tree retrieval, and write operations (branch creation, file commits, pull requests). Every call is scoped to a tenant's encrypted GitHub PAT stored in the api_keys table.

17.1 Token Resolution (BYOK)

GitHub authentication follows the same BYOK pattern as LLM providers. The tenant's personal access token is stored encrypted (AES-256-GCM) in the api_keys table and decrypted per-request via the shared decrypt() utility from the crypto service.

export async function resolveGitHubToken(tenantId: string): Promise<string> {
  const apiKey = await prisma.apiKey.findFirst({
    where: { tenantId, provider: "github" },
    orderBy: { updatedAt: "desc" },
  });
  if (!apiKey) {
    throw new Error(
      "No GitHub token configured. Add a GitHub personal access token in Admin > API Keys."
    );
  }
  return decrypt(apiKey.encryptedKey);
}

When multiple GitHub keys exist for a tenant, the most recently updated one is used (orderBy: { updatedAt: "desc" }).

Data Model: ApiKey

Column Type Description
id UUID Primary key
tenantId UUID Tenant scope (RLS enforced)
userId UUID Key creator
name String User-defined label
provider String "github" for GitHub PATs
encryptedKey Text AES-256-GCM format: v1:<iv>:<tag>:<ciphertext>
lastUsedAt DateTime? Last access timestamp
expiresAt DateTime? Optional expiration

17.2 ghFetch Helper

All GitHub API calls go through a typed generic helper that handles URL construction, auth headers, and error handling:

async function ghFetch<T>(
  token: string,
  endpoint: string,
  options?: { method?: string; body?: unknown }
): Promise<T>

Key behaviors:

  • Accepts both relative paths (/user/repos) and absolute URLs (https://api.github.com/...)
  • Sets Authorization: Bearer, Accept: application/vnd.github+json, and X-GitHub-Api-Version: 2022-11-28
  • Adds Content-Type: application/json only when a body is present
  • Throws on non-2xx responses with the status code and full response body text

17.3 Repository Enumeration (fetchRepos)

fetchRepos builds a comprehensive list of repositories accessible to the authenticated user. It combines two data sources in parallel for completeness, then deduplicates.

flowchart LR A[fetchRepos] --> B["Promise.all"] B --> C["/user/repos\n(owner, collaborator,\norg_member)\nup to 2 pages"] B --> D["/user/orgs\n(up to 100)"] D --> E["fetchOrgRepos\n(up to 10 orgs,\n3 pages each)"] C --> F["Deduplicate by\nfull_name\n(user repos win)"] E --> F F --> G["Normalized list:\nname, fullName,\nowner, description,\nprivate"]

Pagination limits:

  • User repos: up to 2 pages of 100 (200 repos max)
  • Org repos: up to 3 pages per org (300 repos/org), capped at 10 organizations
  • Organization enumeration: 1 page (100 orgs)

Deduplication: Repos are deduped by full_name. User repos take priority -- an org-fetched duplicate only enters the map if the key is not already present. The fetchUserOrgs call fails gracefully (returns []) if the PAT lacks read:org scope. Individual fetchOrgRepos calls also fail gracefully, returning whatever was fetched before the error.

17.4 Memory Path Operations

checkMemoryPath

Checks if a .memory directory (or custom path) exists in a repo and counts letter*.md files:

export async function checkMemoryPath(
  token: string, owner: string, repo: string, path = ".memory"
): Promise<{ exists: boolean; fileCount: number }>

Files are matched with the regex /^letter.*\.md$/i. Returns { exists: false, fileCount: 0 } on any error (missing path, auth failure, 404).

fetchMemoryFiles

Fetches full content of all letter*.md files from the memory directory, sorted newest-first by filename (localeCompare descending). Each file is parsed into a GitHubMemoryEntry:

interface GitHubMemoryEntry {
  title: string;      // From markdown heading, YAML frontmatter, or filename
  content: string;    // Raw markdown file content
  sourceRef: string;  // "github://owner/repo/.memory/letter_YYYYMMDD_XXXX.md"
  tags: string[];     // From YAML frontmatter or default ["memory", "imported"]
  createdAt: Date;    // From filename date pattern or current date
}

Content is fetched via download_url from the Contents API response, with the Bearer token passed in the request headers. Files without a download_url are silently skipped.

listMemoryFiles

Metadata-only variant for selection UIs. Returns { name, path, sha, sourceRef } without fetching file content. Uses the same filtering regex and descending sort as fetchMemoryFiles.

17.5 File Content & Tree Operations

fetchFileContent

Fetches a single file's raw content. Two-step process: first resolves the download_url from the Contents API endpoint, then fetches the raw text with the auth token. Throws if download_url is null or the download request fails.

fetchRepoTree

Uses the Git Trees API (/git/trees/{branch}?recursive=1) to get a flat list of all file paths in a repository. Only blob entries (files, not directories) are returned.

Branch fallback: If the branch parameter is "main" and the API call fails, the function automatically retries with "master". Any other branch value throws on failure.

IGNORE_PATTERNS -- the following paths and files are filtered out:

Pattern Matches
^node_modules/ Node.js dependencies
^\.git/ Git internals
^\.next/ Next.js build output
^dist/, ^build/ Compiled output
^coverage/ Test coverage
^\.cache/, ^\.turbo/ Build caches
^vendor/ Vendored dependencies
^__pycache__/ Python bytecode
\.lock$ Generic lockfiles
^package-lock\.json$ npm lockfile
^yarn\.lock$ Yarn lockfile
^pnpm-lock\.yaml$ pnpm lockfile

The workflow engine further caps tree output at FILE_TREE_MAX_ENTRIES = 500 entries when injecting into prompts via {{fileTree}}.

17.6 Write Operations

Three functions support creating branches, committing files, and opening PRs. These are used by the auto-fix and refactor pipelines to push generated fixes back to GitHub.

flowchart TD A[createBranch] -->|"resolve SHA\nfrom fromRef"| B["/git/ref/heads/{fromRef}"] B -->|"POST new ref"| C["/git/refs"] D[getFileSha] -->|"GET contents\nfor sha check"| E["sha or null"] E --> F[createOrUpdateFile] F -->|"PUT base64 content\n+ commit message"| G["/contents/{path}"] H[createPullRequest] -->|"POST"| I["/pulls"] I --> J["{ number, html_url }"]

createBranch

Resolves the SHA of fromRef (default "main") via /git/ref/heads/{fromRef}, then creates a new ref at that SHA via POST to /git/refs.

createOrUpdateFile

Creates or updates a file in a repository. Content is Base64-encoded via Buffer.from(content).toString("base64"). If sha is provided, the existing file is updated (required by the GitHub API for updates); otherwise a new file is created.

createPullRequest

Creates a pull request and returns { number, html_url }. Parameters: title, body, head branch, and base branch (default "main").

getFileSha

Retrieves the SHA of an existing file at a specific path and branch. Returns null if the file does not exist, making it safe to call before createOrUpdateFile to determine create-vs-update behavior.

17.7 Database Sync (syncToDatabase)

syncToDatabase persists fetched GitHubMemoryEntry records to the memory_entries table. It uses sourceRef as the deduplication key scoped to tenantId + userId, so the same letter file is never imported twice for the same user.

export async function syncToDatabase(
  tenantId: string, userId: string, projectId: string,
  entries: GitHubMemoryEntry[]
): Promise<{ created: number; skipped: number }>

Each entry is created as a MemoryEntry with:

  • source: "github"
  • sourceRef: "github://owner/repo/.memory/letter_YYYYMMDD_XXXX.md"
  • metadata: { projectId } linking back to the nyxCore project

The return value reports how many entries were created vs. skipped (already existing).

17.8 Memory File Parsing Helpers

Three private helpers extract structured metadata from raw letter files:

Helper Strategy Fallback
extractTitle(content, filename) 1. First # heading in markdown, 2. YAML frontmatter title: field Filename with .md stripped and _ replaced by spaces
extractTags(content) YAML frontmatter tags: [...] array, comma-separated ["memory", "imported"]
extractDateFromFilename(filename) Regex match on letter_YYYYMMDD pattern new Date() (current timestamp)
flowchart TD A["letter_20260224_0011.md"] --> B["extractDateFromFilename"] B --> C["2026-02-24"] D["# Session Notes\n---\ntags: ['review', 'sprint-3']\n---"] --> E["extractTitle"] E --> F["Session Notes"] D --> G["extractTags"] G --> H["['review', 'sprint-3']"]

The tag extraction strips surrounding quotes (single or double) from each tag value. The date extraction specifically matches the letter_YYYYMMDD prefix -- the trailing _XXXX sequence number is ignored for date purposes.