Uses Meta's AudioSeg model (https://github.com/facebookresearch/audioseg) under the hood. Also used this as a chance to learn Modal for GPU inference.
Code is open source: https://github.com/sambarrowclough/clearaudio
Happy to answer questions about the stack or take feedback on the UX!