I trained a 90-day weather AI on a single GPU using 150 years of data(github.com)

1 pointby sentinelowl22 days ago1 comment

sentinelowl22 days ago
Hey HN,
I built LILITH, an open source ML weather prediction system that runs on consumer hardware. The model trains in 15 minutes on an RTX 3060, the checkpoint is 22MB, and inference takes under a second.
THE PROBLEM
GraphCast, Pangu-Weather, and similar models are impressive but require: - ERA5 reanalysis data (controlled by ECMWF) - 80GB+ VRAM for inference - Institutional-scale compute
Meanwhile, NOAA’s GHCN dataset has 100K+ weather stations, 150+ years of data, completely public domain.
THE APPROACH
Instead of requiring gridded reanalysis, LILITH learns directly from sparse station observations:
Transformer encoder on 30 days of historical data Autoregressive decoder for multi-day prediction Multi-timescale rollout: 6h steps for days 1-14, daily for 15-42, weekly for 43-90 Climate signal injection (ENSO, MJO) for extended range Total parameters: 1.87M. You could email the checkpoint.
RESULTS
Trained on 915K sequences from 300 US stations: - Temperature RMSE: 3.96C - Temperature MAE: 3.01C - Climatology baseline is ~7C RMSE
For context, this beats just predicting historical averages, though it is not GraphCast-accurate for short range. The value is accessibility, not beating ECMWF.
HONEST LIMITATIONS
Days 1-7 are worse than operational models 90-day “forecasts” are really climate outlooks, not weather predictions Currently US stations only No ensemble/uncertainty quantification yet TECH STACK
PyTorch 2.x with Flash Attention FastAPI backend Next.js 14 frontend with glassmorphism UI Trains on 8GB VRAM with mixed precision The frontend has interactive 90-day charts, a station command center showing all 300 stations with predicted vs actual temps, and historical data exploration.
WHY IT MATTERS
Weather prediction has been an institutional monopoly. The data is public, consumer GPUs are powerful enough, and transformer architectures are well understood. There is no reason useful forecasting should be locked behind institutional walls.
Would love feedback on the station-native approach vs requiring ERA5, and whether the multi-timescale rollout makes sense for extended range.