This is backwards. The model is the easy part. Getting good data is 99% of the job, and nearly any clown can make a good model once you hand them a good dataset.
If you hand me a clean, well-labeled, representative dataset, I can make the model do a respectable little dance by lunch.
If you hand me a Kaggle CSV with duplicated rows, target leakage, mislabeled outcomes, and columns named final_final_v2_REAL, suddenly I’m not doing ML anymore. I’m doing archaeology with a red nose on.
The model is the balloon animal. The dataset is the elephant you had to drag into the tent.