>
Over the past few years, inference-specific chip start-ups were experiencing a sort of Cambrian explosion, with different companies exploring distinct approaches to speed up the task. The start-ups include D-matrix with digital in-memory compute, Etched with an ASIC for transformer inference, RainAI with neuromorphic chips, EnCharge with analog in-memory compute, Tensordyne with logarithmic math to make AI computations more efficient, FuriosaAI with hardware optimized for tensor operation rather than vector-matrix multiplication, and othersLet us add Taalas, which implemented Llama3 8b on hardware to achieve 17000 tokens/s at a small fraction of the usual power consumption. Quality test at https://chatjimmy.ai/