Originally, my first instinct was to use Slurm or AWS batch, but started having problems once we tried to multi cloud. We're also optimizing for being able to onboard an arbitrary codebase as fast as possible, so building a custom structure natively compatible with our containers (which are now automatically made from linux machines with the relevant models deployed) has been helpful.
What were the biggest challenges in getting major pharma companies onboard? How do you think it was the same or different compared to previous generations of YC companies (like Benchling)?
Some of the same problems exist, large enterprises don't want to process their un-patented, future billion-dollar drug via a startup, because leaking data could destroy 10,000 times the value of the product being bought.
Pharma companies are especially not used to buying products vs research services, there's also historical issues with the industry not being served with high quality software, so it is kind of a habit to build custom things internally.
But I think the biggest unlock was just that the tools are actually working as of a few years ago.
If you look at the recent research on ML/AI applications in biology, the majority of work has, for the most part, not provided any tangible benefit for improving the drug discovery pipeline (e.g. clinical trial efficiency, drugs with low ADR/high efficacy).
The only areas showing real benefit have been off-the-shelf LLMs for streamlining informatic work, and protein folding/binding research. But protein structure work is arguably a tiny fraction of the overall cost of bringing a drug to market, and the space is massively oversaturated right now with dozens of startups chasing the same solved problem post-AlphaFold.
Meanwhile, the actual bottlenecks—predicting in vivo efficacy, understanding complex disease mechanisms, navigating clinical trials—remain basically untouched by current ML approaches. The capital seems to be flowing to technically tractable problems rather than commercially important ones.
Maybe you can elaborate on what you're seeing? But from where I'm sitting, most VCs funding bio startups seem to be extrapolating from AI success in other domains without understanding where the real value creation opportunities are in drug discovery and development.
So both things can be true: the more important bottlenecks remain, but progress on discovery work has been very exciting.
Runs vary significantly between models/protocols used, some generative models can take several hours, while some will run a few seconds. We have tools that would screen against DBs if the goal is to find an existing molecule to act against the target, but often, people will import and existing starting point and modify it or design completely novel ones on the platform.
We do let people onboard their own models too, basically the users just see a separate tab for their org, which is where all the scripts, docker images, notebooks their developers built interfaces for live on Tamarind.
I would say primary concerns were:
dependency issues, needing more than model weights to be able to consume models (Multiple Sequence Alignment needs to be split, has its own always on server, so on), more convenient if the inputs and outputs are hardened interfaces as different envs
Our general findings in the BioML are that the models are not at all standardized especially compared to the diffusion model world for example, so treating each with its own often weird dependencies helped us get out more tools quicker.
We actually did have this available early on, our rationale for why we structure it differently now is basically that there is a lot of diversity between how people use us. We have some examples where a twenty person biotech company will consume more inference than a several hundred person org. Each tool has very different compute requirements, and people may not be clear on which model exactly they will be using. Basically we weren't able to let people calculate the usage/annual commitment/integration and security requirements in one place.
We do have a free tier which tends to be decent estimate of usage hours and a form you can fill out if and we can get back to you with a more precise price.
I think most large companies have similar expectations around security requirements, so once those are resolved most IT teams are on your side. We occasionally do some specific things like allowing our product to be run in a VPC on the customer cloud, but I imagine this is just what most enterprise-facing companies do.