Automatic Memo Version Discovery from Private GitHub Repo

Automatic latest version discovery for investment memos from private GitHub repo with 10-minute TTL caching.

Claude
Claude Claude

Changelog - 2025-12-13 (#1)

Dark Matter Site: Automatic Memo Version Discovery from Private GitHub Repo

Overview

Implemented automatic "latest version discovery" for investment memos, enabling the site to dynamically find and serve the most recent memo version for each company without manual configuration updates. The system fetches memos directly from the private lossless-group/dark-matter-private-data GitHub repository at runtime.

Key capabilities:

  • Automatic version discovery: Given a company name (e.g., "MitrixBio"), the system scans the outputs directory and finds the highest semver version
  • Dual-mode operation: Supports both GitHub API mode (production) and local filesystem mode (development)
  • Multiple naming conventions: Handles {Company}-{version}-draft.md and 6-{Company}-{version}.md formats
  • Caching: 10-minute TTL cache for discovery results to reduce API calls
  • Pipeline integration: Confidential pipeline page automatically resolves latest memos for all companies

Changes by Area

1. Version Discovery System (src/lib/github-content.ts)

Added comprehensive version discovery functionality:

  • parseVersion(version: string): Parses semver strings (e.g., "v0.0.2") into [major, minor, patch] tuples
  • compareVersions(a, b): Compares two version tuples for sorting
  • listCompanyVersionsGitHub(companyName): Lists version directories via GitHub API at deals/{Company}/outputs/
  • listCompanyVersionsLocal(companyName): Lists version directories from local orchestrator path
  • findDraftMemoGitHub(path, company, version): Finds the draft memo file with pattern matching
  • findDraftMemoLocal(path, company, version): Local filesystem equivalent
  • getLatestMemoSlug(companyName): Main discovery function - returns the slug for the latest memo
  • resolveLatestMemos(companyNames[]): Batch resolver for efficiency (parallel execution)
  • fetchLocalMemoContent(slug): Fetches memo content from local orchestrator directory
  • deriveGitHubPathFromSlug(slug): Derives GitHub path from slug, handling multiple formats:
    • Numbered prefix: 6-RavenGraph-v0.0.3deals/RavenGraph/outputs/RavenGraph-v0.0.3/6-RavenGraph-v0.0.3.md
    • URL-safe: Aito-v002-draftdeals/Aito/outputs/Aito-v0.0.2/Aito-v0.0.2-draft.md
    • Dotted: MitrixBio-v0.0.2-draftdeals/MitrixBio/outputs/MitrixBio-v0.0.2/MitrixBio-v0.0.2-draft.md

Updated getConfig() to support MEMO_DISCOVERY_LOCAL environment variable for forcing local mode during development.

Updated isLocalDemoMode() to check both useLocalFallback and forceLocalDiscovery flags.

Updated fetchMemoBySlug() to use local content fetching when in local discovery mode.

2. Pipeline Data Model (src/content/pipeline/pipeline-companies.json)

Changed from explicit memo slugs to company keys:

Before:

{
  "conventionalName": "Mitrix Bio",
  "extendedMemoMD": "MitrixBio-v0.0.2-draft"
}

After:

{
  "conventionalName": "Mitrix Bio",
  "memoCompanyKey": "MitrixBio"
}

This decouples the pipeline data from specific memo versions - the system automatically discovers the latest version at runtime.

3. Confidential Pipeline Page (src/pages/pipeline/confidential/index.astro)

Updated to use the new discovery system:

// Get all company keys that need memo resolution
const companyKeys = pipelineData
  .map(c => c.memoCompanyKey)
  .filter((key): key is string => key !== null);

// Resolve latest memo slugs for all companies (runs in parallel)
const memoSlugs = await resolveLatestMemos(companyKeys);

// Enhance pipeline data with resolved memo slugs
const enhancedPipelineData = pipelineData.map(company => ({
  ...company,
  resolvedMemoSlug: company.memoCompanyKey ? memoSlugs.get(company.memoCompanyKey) || null : null,
}));

Template uses company.resolvedMemoSlug for memo links.

4. Pipeline Public Page Redirect Fix (src/pages/pipeline/index.astro)

Fixed the "Access Confidential View" CTA button:

Before:

<a href="/portfolio-gate" ...>

After:

<a href="/pipeline/confidential" ...>

This allows the middleware to properly redirect to the gate with the correct return path (redirect=/pipeline/confidential), so after authentication users return to the confidential pipeline page instead of the portfolio page.

5. Environment Configuration

Updated .env and .env.example:

  • Added MEMO_DISCOVERY_LOCAL environment variable
  • Fixed GITHUB_CONTENT_REPO from dark-matter-secure-data to dark-matter-private-data
# GitHub Content Repository
GITHUB_CONTENT_PAT=github_pat_xxxxx
GITHUB_CONTENT_OWNER=lossless-group
GITHUB_CONTENT_REPO=dark-matter-private-data
GITHUB_CONTENT_BRANCH=main

# Memo Discovery Mode (Development)
# Set to 'true' to use local filesystem instead of GitHub API
MEMO_DISCOVERY_LOCAL=false

Files Changed Summary

Modified

  • src/lib/github-content.ts — Added version discovery system with GitHub API and local filesystem support
  • src/content/pipeline/pipeline-companies.json — Changed from extendedMemoMD to memoCompanyKey
  • src/pages/pipeline/confidential/index.astro — Integrated resolveLatestMemos() for automatic version discovery
  • src/pages/pipeline/index.astro — Fixed CTA redirect to use /pipeline/confidential
  • .env — Fixed repo name, added MEMO_DISCOVERY_LOCAL
  • .env.example — Updated documentation and defaults

Technical Details

GitHub Repository Structure

The discovery system expects memos in this structure:

deals/
├── Encellin/
│   └── outputs/
│       └── Encellin-v0.0.1/
│           └── Encellin-v0.0.1-draft.md
├── MitrixBio/
│   └── outputs/
│       ├── MitrixBio-v0.0.1/
│       └── MitrixBio-v0.0.2/
│           └── MitrixBio-v0.0.2-draft.md
└── RavenGraph/
    └── outputs/
        └── RavenGraph-v0.0.3/
            └── 6-RavenGraph-v0.0.3.md

Draft File Pattern Priority

When finding the draft memo in a version directory, the system checks patterns in this order:

  1. {Company}-{version}-draft.md (standard format)
  2. {N}-{Company}-{version}.md (numbered pipeline format)
  3. {Company}-{version}.md (without -draft suffix)
  4. Fallback: any .md file containing company name and version

Caching Strategy

  • Discovery cache: 10-minute TTL for getLatestMemoSlug() results
  • Content cache: 5-minute TTL for fetched memo content
  • Caches are in-memory and reset on server restart

Notes / Follow-Ups

  • When new memo versions are pushed to the GitHub repo (e.g., MitrixBio-v0.0.3), they'll be automatically discovered without code changes
  • The 10-minute cache means new versions may take up to 10 minutes to appear after being pushed
  • For development, set MEMO_DISCOVERY_LOCAL=true to use local orchestrator files instead of GitHub API
  • The local orchestrator path is hardcoded to /Users/mpstaton/code/lossless-monorepo/ai-labs/investment-memo-orchestrator/io/dark-matter/deals/ - this should be made configurable for other developers