Show HN: TabPFN-2.5 – SOTA foundation model for tabular data(priorlabs.ai)

73 pointsby onasta3 months ago9 comments

zurfer3 months ago
The current go to solution for the kinds of problems that TabPFN is solving would be something like XGBoost. In general it's a good baseline, but the challenge is always that you need to spend a lot of time feature engineering and tweaking the data representation before something like XGBoost can deliver good performance on your regression or classification problems.
For me the promise of foundation models for tabular data is that there are enough generalizable patterns, so that you need less manual feature engineering and data cleaning.
And kudos to the team, I think it's a really creative application of neural networks. I was always frustrated with neural networks, since they were hard to tune on "structured" data and always under-performed (for me), but we also never had real foundational models for structured data.
- noahho3 months ago
  Less feature engineering is definitely something we are aiming for. The current version is actually only based on statistics, the real world connections between features is something we're working on right now and hope to show results for soon. That's the next step
TheTaytay3 months ago
Looks really cool. In reading through the FAQ, it says this: Q: "How are text features handled?" A: "In the local package version text features are encoded as categoricals without considering their semantic meaning. Our API automatically detects text features and includes their semantic meaning into our prediction. The local package version encodes text as numerical categories and does not include semantic meaning."
So that means that automatic embedding/semantic meaning is reserved for API use of TabPFN, right? Otherwise, if I use it locally, it's going to assign each of my distinct text values an arbitrary int, right?
- noahho3 months ago
  Yes exactly, the API is the best way to handle text features. The actual semantics often matter a lot . Is the API an option for you or would you need this local?
scorpion73 months ago
It's fascinating how this works with such a small model. Especially given that the training is a kind of meta learning of "how to do in-context learning". I wonder, is there a good intuition of the role of the MLP in this architecture? For LLMs the consensus seems to be that they store knowledge...what would that be for tabular data?
vessenes3 months ago
I think you need a custom benchmark -- have you considered making one out of the excel world championships?
aitchnyu3 months ago
Are applications using table models supposed to load the entire thing into context? As A CRUD app guy, I either ask AI to read entire table if its small or use/make scripts to analyze it if its big.
enigmaa993 months ago
have been using since an year now for benchmarking and the improvements with 2.5 look massive. A lot of usecases already discussed in the report will help interdisciplinary domains improve their predictions.
dill_13 months ago
Tabular data is still underrated!
- noahho3 months ago
  When we released TabPFNv1 over three years ago, I didn’t expect at all the hundreds of comments and reposts we would see. Tabular data had been a field getting little love from AI research—but we immediately felt that this was a topic that data scientists, scientists, financial analysts, and enterprise users deeply cared about. Glad its useful to people!
abracos3 months ago
how does it compare to automl tools?
- noahho3 months ago
  TabPFN-2.5 default (one forward pass) matches AutoGluon 1.4 tuned for four-hours. Autogluon is the strongest AutoML including stacking of XGB and cat boost and even includes the previous TabPFNv2.
klemens_floege3 months ago
Good stuff!