1. ISPC [1], the Intel® Implicit SPMD Program Compiler also compiles SIMD programs with branches and other control flow efficiently using predication/masking etc.
2. Futhark [2] compiles nice-looking functional programs into efficent parallel GPU/CPU code.
[1]: https://github.com/ispc/ispc [2]: https://futhark-lang.org/
Does it have some loophole to allow loops? Or it just allows linear execution?
Can it read or write external memory (allocated within other language like C++)?