Deterministic and AI-agent broker import to prevent portfolio data corruption(www.portfolio-terminal.com)

1 pointby julien_devv6 hours ago2 comments

julien_devv6 hours ago
Hi HN — I’m building Portfolio Terminal to solve a specific problem:
Broker exports are inconsistent (CSV/JSON/PDF, locale formats, ISIN/ticker mismatches), and one bad parse can silently corrupt portfolio cost basis.
I ended up with a deterministic-first + AI-fallback pipeline:
- Step 1 (deterministic parser): parse JSON/CSV locally, including semicolon CSV, EU/US number formats, date-first rows, and duplicate symbol aggregation. - Step 2 (AI import agent): only if local parsing fails, call an LLM (text/vision) to extract strict JSON positions. - Step 3 (guardrails): normalize symbols, reject invalid qty/price rows, attach confidence per row. - Step 4 (human verification): user reviews/edits rows before write. - Step 5 (state-safe apply): import in either snapshot or additive mode, with explicit transaction synthesis (BUY/SELL diffs) and conservative avg-price rules to avoid cost-basis drift.
I also added a separate AI “Oracle” agent for portfolio Q&A with structured output: { answer, confidence, metricsUsed, why } — to keep responses explainable and auditable.
Stack: Next.js 16, React 19, TypeScript, Prisma/Postgres.
I’d love technical feedback on: 1) deterministic vs LLM boundary, 2) import invariants I should enforce, 3) audit trail design for replay/debug of imports.
Demo: https://www.portfolio-terminal.com/demo
julien_devv6 hours ago
A few concrete implementation details:
- Local parser intentionally rejects ambiguous free text and only accepts strict tabular/JSON shapes. - Number parsing is strict (e.g., rejects mixed alphanumeric like “qty=10”), with locale normalization. - Symbol normalization handles common aliases (e.g., “S&P 500” -> SPY, “Bitcoin” -> BTC-USD). - In additive mode, avg price is only recomputed when incoming qty > 0 and incoming avg is reliable. - In snapshot mode, absent imported symbols are explicit delete candidates (normalized set diff). - LLM path is schema-constrained in prompt and then revalidated server-side before persistence.
Goal is not “fully autonomous AI”, but reducing manual import pain while keeping deterministic safety and user control.