2 pointsby SimplyLiz12 days ago2 comments
  • justinlords11 days ago
    This is exactly what I've been screaming about, AI coding assistants are basically playing guess the impact with our production code. The fact that you're exposing actual call graphs and blast radius through MCP tools instead of making Claude hallucinate dependencies is huge from my pov. Installing this now to test with our multi-repo setup. Does the telemetry integration for dead code detection require specific instrumentation? does it hook into existing APM tools?
    • SimplyLiz11 days ago
      Thanks! For multi-repo, check out the federation features (--preset federation) It handles cross-repo symbol resolution and blast radius across service boundaries.

      See docs: https://codeknowledge.dev/docs/Federation

      On dead code detection: CKB has two modes:

      1. Static analysis (findDeadCode tool, v7.6+) - requires zero instrumentation. Uses the SCIP index to find symbols with no inbound references in the codebase. Good for finding obviously dead exports, unused internal functions, etc. No telemetry needed. 2. Telemetry-enhanced (findDeadCodeCandidates, v6.4+) - ingests runtime call data to find code that exists but is never executed in production. This is where APM integration comes in.

      For the telemetry integration: It hooks into any OTEL-compatible collector. No custom instrumentation required, it parses standard OTLP metrics:

      - span.calls, http.server.request.count, rpc.server.duration_count, grpc.server.duration_count - Extracts function/namespace/file from span attributes (configurable via telemetry.attributes.functionKeys, etc.)

      You'd configure a pipeline from your APM (Datadog, Honeycomb, Jaeger, whatever) to forward aggregated call counts to CKB's ingest endpoint. The matcher then correlates runtime function names to SCIP symbol

      IDs with confidence scoring (exact: file+function+line, strong: file+function, weak: namespace+function only).

      Full setup: https://codeknowledge.dev/docs/Telemetry

      The static analysis mode is probably enough to start with. Telemetry integration is for when you want "this code hasn't been called in 90 days" confidence rather than "this code has no static references."

  • storystarling12 days ago
    Curious how the token optimization presets balance context window costs against the depth of call graph analysis. I've found that aggressively pruning context to save on input tokens often degrades reasoning quality pretty quickly when dealing with complex dependencies.
    • SimplyLiz11 days ago
      The architecture separates tool availability from result depth, which addresses exactly that concern.

      Presets control tool availability, not output truncation. The core preset exposes 19 tools (~12k tokens for definitions) vs full with 50+ tools. This affects what the AI can ask for, not what it gets back. The AI can dynamically call expandToolset mid-session to unlock additional tools when needed.

      Depth parameters control which analyses run, not result pruning. For compound tools like explore: - shallow: 5 key symbols, skips dependency/change/hotspot analysis entirely - standard: 10 key symbols, includes deps + recent changes, parallel execution - deep: 20 key symbols, full analysis including hotspots and coupling

      This is additive query selection. The call graph depth (1-4 levels) is passed through unchanged to the underlying traversal—if you ask for depth 3, you get full depth 3, not a truncated version.

      On token optimization specifically: CKB tracks token usage at the response level using WideResultMetrics (measures JSON size, estimates tokens at ~4 bytes/token for structured data). When truncation does occur (explicit limits like maxReferences), responses include transparent TruncationInfo metadata with reason, originalCount, returnedCount, and droppedCount. The AI sees exactly what was cut and why.

      The compound tools (explore, understand, prepareChange) reduce tool calls by 60-70% by aggregating what would be sequential queries into parallel internal execution. This preserves reasoning depth while cutting round-trip overhead. The AI can always fall back to granular tools (getCallGraph, findReferences) when it needs explicit control over traversal parameters.