10 pointsby nbosse3 hours ago1 comment

nbosse3 hours ago
We built this after too many rounds of deduplication on messy data. Each technique in the deduplication funnel solves what the previous one can't, but the real pain is orchestrating all three together at scale: chunking to avoid O(n²), batching LLM calls (accuracy degrades past ~25 items), rate limiting across embedding and completion APIs simultaneously. We packaged the pipeline into a Python SDK. Here's a 500-row CRM dataset that cost $0.74, ~100 sec to dedupe: https://everyrow.io/docs/resolve-entities-python