1 pointby sutaniese3 hours ago1 comment
  • sutaniese3 hours ago
    Hey everyone, We’ve all been there. You have a cool idea for a model or a RAG pipeline, but before you can do anything interesting, you’re stuck in "Data Hell" for three hours. You’re jumping between tabs to find a dataset, manually checking for missing values, realizing the schema is a mess, and praying there’s no PII (emails/phones) hidden in the CSV. It’s tedious, repetitive, and frankly, it’s the reason many projects die before the first training run. I decided to fix this by building Vesper. It’s a Model Context Protocol (MCP) server that turns your AI into a full-stack data engineer. Instead of you writing cleaning scripts, you just tell your AI what you need. Here is what Vesper actually does: Universal Search: Query thousands of datasets across HuggingFace, Kaggle, and even specialized sources like UCI, GitHub, World Bank, and NASA simultaneously. Deep Quality Analysis: Runs automated audits to detect outliers, duplicates, and schema anomalies (like numbers stored as strings). Multimodal Support: Beyond tabular data (CSV/Parquet), it handles images, audio, and video, including automated annotation and quality checks. Self-Healing Pipelines: Automatically generates a cleaning plan to impute missing values, remove outliers using IQR, and encode categorical data. JIT Ingestion & Performance: Instantly downloads data and uses Dask or Spark for distributed processing of massive datasets. Privacy & Compliance: Vesper never sees your data, everything is local. Async Job Management: Long-running tasks run in the background with live progress bars streamed directly to your chat interface. Developer Collaboration: Features self-versioning, personalized recommendations, and easy export to Jupyter Notebooks or Git. I’m opening a Waitlist today because I need feedback from people who actually deal with messy data every day. I want to know which "janitor" tasks you hate the most so I can refine the engine.

    (sorry for using lovable. I used it to spin up a waitlist quickly for validation while I focus on the tech) I'll be hanging out in the comments to answer anything technical! Thanks!