raviadiprakoso2 hours ago
This is my personal project for training NN using AMD older GPU because we have no ROCm support by using DirectX11 as subtitute. It means you can train Alexnet style model or ResNet on windows desktop that support DX11. Future roadmaps would be support for Linux that use Vulkan and migrating to DX12 for Windows. So we don't have to dealt with that GroupMemoryBarrierWithGroupSync overheads. I have verified the gradients computations against PyTorch and it exactly matched. Most of the times, it achieved around 20-40% FLOPs efficiency (this also depends heavily on VRAM bandwidth) but it could be improve further. This project is also open source on Github, go check it out.