10 pointsby nbosse3 hours ago1 comment
  • nbosse3 hours ago
    We built this after too many rounds of deduplication on messy data. Each technique in the deduplication funnel solves what the previous one can't, but the real pain is orchestrating all three together at scale: chunking to avoid O(n²), batching LLM calls (accuracy degrades past ~25 items), rate limiting across embedding and completion APIs simultaneously. We packaged the pipeline into a Python SDK. Here's a 500-row CRM dataset that cost $0.74, ~100 sec to dedupe: https://everyrow.io/docs/resolve-entities-python