Basically, I encoded the image into a binary bitstream, and then using a carefully designed encoding system (such as checking for the presence of oxford commas, discourse markers, etc.), I could encode these bits into ~200 wikipedia edits. The edit ids (used to retrieve the edited sentences, and hence reconstruct the image), were stored into a playable .WAV file.