Those problems you mention are important in music synthesis where people could live with limited reconfigurability but reliability is at a premium: synth players in early touring bands (e.g. Yes) had to be electronics technicians and instruments have to survive being packed in boxes and transported everywhere. The Yamaha DX-7 made FM synthesis mainstream because digital FM synthesis was absolutely reliable.
You’ve got to wonder when you have an image generation demo why would you possibly have 64 x 64 pixel output as your demo?
If I’m understanding this properly to generate a 4K image, you need like 5 trillion point to point connections on the chip. Even if power use from the oscillators is zero that’s going to be an issue.
These are cool results but I was disappointed not to find any discussion of where oscillator array technology stands today what the manufacturing challenges/opportunities might be. It seems like it would be prohibitively expensive for anything beyond minimal networks of a few hundred nodes that could be used in sensors. Even if you have perfectly consistent oscillators that synchronize to each other within very fine tolerances, wiring them up to each other is still a massive headache.
But specifically what they’ve simulated here? I don’t see how that would ever work in real life scaled up to any kind of real size.
I’m not criticizing them for starting out small. Lots of things can be proven with small models. I’m saying in principle, I don’t see how this will work unless there’s some fundamentally new technique that is currently not known about. Maybe they have some secret idea but they haven’t shown it here.
Do you mean that they may get away with less oscillators because of the decoder layer? Well there’s the rub isn’t it, the more work you have done by a software layer the less power you’ve proportionally saved by having it be done by physical computing.
But let’s spitball here what would you estimate would be needed in number of oscillators and interconnects for a 4K image?
One thing I'm unclear on is that their total parameter count scales similarly to conventional models but many of those conventional models incorporate convolutions. I wonder how interconnect count (as opposed to unique parameters) compares to performance?
As to 4k images, I'm not clear how much farther their current architecture would be expected to scale. Single layer networks aren't parameter efficient compared to deep networks; I'd naively assume that to also apply here. That said given their results so far with what amounts to a single layer the naive assumption begins to seem questionable.
We can implement coupled oscillators in hardware but are the couplings and frequencies programmable? If they're being streamed in I guess you'd still have a memory bandwidth bottleneck and associated energy usage. If not then the fair comparison is to a conventional model hardcoded in an ASIC which AFAIU is actually quite energy efficient.