DeepSeek open-sources inference optimizations with 60–85% faster generation [pdf](github.com)

447 pointsby aurenvale3 hours ago15 comments

kamranjon2 hours ago
DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.
- tomalaci2 hours ago
  Probably because American AI companies are on the hook for quite a lot of investment money. I think they are trying to find the magical moat to justify their valuation.
  Revealing optimizations similar to these would pretty much reduce their competitive position.
  - lwansbrough2 hours ago
    Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.
    I suspect their tune will change if they ever take the lead..
    c7b6 minutes ago
    The question is also what game they're playing. Deepseek came out of a hedge fund. I think it's no coincidence that their publications tend to have a large impact on AI stock prices.
    Destroying the growth story of overvalued stocks is an interesting investment strategy. It's not even new. Shortsellers understandably get terrible rep from execs, but their actions are more often in the public interest than you'd think. Normally it's exposing fraud, but here we get the really fortunate side benefit of what could eventually amount to the most significant contribution to the general software community since Linux.
    oefrha2 hours ago
    Which is a good thing. Self-serving motives are more reliable than altruistic ones.
    intendedan hour ago
    The world runs on incentives. Altruism/Self-serving are down stream of that.
    Wikipedia is altruistic, and serves humanity quite well.
    theturtletalksan hour ago
    Open-source is also altruistic. If DeepSeek does become self-serving once they get the top spot, it doesn’t take away from the altruistic contributions that they made towards open models.
    wqaatwt9 minutes ago
    > Open-source is also altruistic
    Contributing to it might not necessarily be. Most open source development is funded by large companies after all and from their perspective it can function as a cost saving measure. Allowing them to focus on their core products and removing the possibility of their rivals from getting a competitive advantage due to having a superior low level stack under their product.
    Which is why open source is so successful in areas where software is a cost-center but mostly failed for consumer products (since spending resources on them would actually be altruistic unlike e.g. Linux kernel development)
    brookst39 minutes ago
    And ultimately the motivation for those contributions just doesn’t matter, except to those who like to anthropomorphize company and argue about their souls.
    kelipso5 minutes ago
    Or if they want to do anything close to predicting what they will do in the future, like curious and interested humans tend to want to do.
    Dibby0536 minutes ago
    People who donated to OpenAI in its early years might disagree on that.
    amelius2 hours ago
    You mean more predictable, not more reliable.
    rrvsh2 hours ago
    Could you explain? (asking in good faith)
    IshKebaban hour ago
    I don't think so. I can confidently predict that altruism will give you a very unreliable income stream in the vast majority of cases.
    nubg2 hours ago
    Very interesting take
    broodbucket2 hours ago
    Look at how far OpenAI has drifted from their original mission. Everything comes back to greed, so it's ideal for the world if selfish motives happen to coincide with what's good for the world, like advancements in open models
    roenxi2 hours ago
    It's a standard take since it is how markets tend to work. They aren't powered by altruism, it is a big system for turning greed into good results. We don't have all this stuff because people suddenly woke up one morning and decided to be nice.
    breezybottom41 minutes ago
    Yes but there's more to the world than markets.
    wqaatwt5 minutes ago
    On aggregate mainly because humans often tend to behave “irrationally” due to various reasons though
    lelanthranan hour ago
    I don't understand what is interesting about it: it's the default.
    Markets don't run on altruism.
    woctordho42 minutes ago
    And humans don't run on markets.
    wqaatwt2 minutes ago
    Mostly they kind of do since we do live in an utopian society of unlimited abundance. Extremely few people can afford to (or want to) spend a very large number of working hours without ever getting anything directly in return for it.
    throw123456789119 minutes ago
    Neither on altruism.
    FooBarWidgetan hour ago
    The standard is applied very inconsistently. Nobody accuses the local bakery of being motivated by profit, and that they don't bake bread for you out of altruism.
    AlecSchueleran hour ago
    Isn't it the entire basis of capitalism?
    2 hours ago
    undefined
    tw19842 hours ago
    > Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.
    US labs in Google, Meta and SpaceX are not leading, none of them managed to build something on par with GLM 5.2.
    Care to explain to me why they still don't collaborate and still choose to do it in private?
    vidarh2 hours ago
    I'm not sure I'd put Google in that list, but either way: Because they think they have enough capital that they can catch up and don't need the reputational boost of this.
    CuriouslyC2 hours ago
    As good as Gemini's visual intelligence is, it's a terrible agent.
    7speter2 hours ago
    Google at least still releases open source models to the public.
    VorpalWay27 minutes ago
    Aren't they only open weights, not true open source?
    re-thc35 minutes ago
    Thank Apple?
    Those are mostly for embedded devices and the current "sponsor" is Apple.
    budsniffer9522 hours ago
    Wait, are you claiming that these companies haven't contributed to the ecosystem via research and open source?
    lwansbrough2 hours ago
    No idea I don’t work there.
    arj25 minutes ago
    Not everyone is motivated by greed
    tirant19 minutes ago
    What do you think is the underlaying motivation?
    jmyeet2 hours ago
    Projection is a funny thing. It causes people to misread situations all the time. Southern slaveowners feared violent retribution from freed slaves, for example [1]. It was pure projection and said more about the South than it did the slaves. The reality was there was no violent retribution. It was the opposite where the former slaveowners continued to inflict violence on the formerly enslaved.
    I say this because we see the same thing used as an argument against China. "If they overtake us, they'll do imperialism (like us)." Again, it says more about us than them.
    A better reading (IMHO) Of the situation is that China believes that AI shouldn't be used simply to mint a few more trillionaires but the benefits should be shared with society. Why do I say this? Because we now have 70+ years of China doing exactly that. The transformation in China all the way from rural villages to Tier 1 cities has been utterly astounding. China has lifted ~800M people out of extreme poverty.
    In some ways we're at a similar point to the late 1990s and 2000s when Microsoft execs complained that Linux, being free, destroyed intellectual property value. Linux should be a perfect example of how people can and do act altruistically, or at least not in a way to bait-and-switch to enrich themselves.
    [1]: https://www.reddit.com/r/AskHistory/comments/1d26grm/in_the_...
    FooBarWidgetan hour ago
    It's even worse than that. China publishes stacks upon stacks of policy documents in which they explain clearly what they will do and why. This includes why they do poverty alleviation and why they believe big monopolies that own everything are bad. But almost no western observers care to read those documents. Instead, western observers, including HN, speculate endlessly about China's intentions, and "it would be naive to believe they would not do X" or drawing equivalences to Soviet Union or whatever. And the "journalists" sell this notion that Chinese state intentions are "untransparent" and "unknowable" while pretending the policy documents don't exist.
    Meanwhile, Xi Jinping has published his 5th book on how governance in China works and what they're after. These are not books written for a western audience: they're compilations of speeches that he already gave to the Chinese party and state apparatus, so the contents are not sanitized for foreign audiences. But there are no English reviews of summaries of this 5th book at all by the usual China experts that distribute what western audience know about China.
    This extends to beyond the government. Even though "for the people but only against the government" is an often-heard mantra, nobody seems to listen to what Chinese AI companies themselves say about why they publish open models. DeepSeek and GLM have said multiple times publicly what their motivations are, yet people on HN still speculate like they usually do.
    Truly mind-boggling. I get that a lot of people don't like China. But setting aside the question of whether their dislike is justified, it would at least be rational to properly understand China, even if it's to defeat it. And listening to what China says themselves is absolutely essential for proper understanding. But people don't bother to? And they seem mostly happy with sticking to speculations that match preconceived notions, even if that hurts their chances of defeating China.
    jmyeetan hour ago
    I 100% agree with you and want to add something.
    If you simply take what the Chinese government says at face value, you will be correct way more often than 95% of Western policy wonks, media talking heads, "analysts" and so forth. Because, like you say, they tell you everything they're doing.
    In the recent US-China summit, Xi Jinping just came out and used the Thucydides Trap metaphor, which tells you everything about where China thinks it is and where it sees the US going, which is to become increasingly belligerent as their power declines. Now whether or not you agree with that assessment (I do agree), it still tells you China wants to avoid open hostilities, it sees itself as continuing to rise and it fears what a declining US might do.
    FooBarWidgetan hour ago
    The Thucydides Trap mention is different from what you describe. Xi has dismissed the Thucydides Trap multiple times in the past as being hearsay and self-imposed bias (https://www.globaltimes.cn/content/944179.shtml). "We should strictly base our judgment on facts, lest we become victims to hearsay, paranoid or self-imposed bias. There is no such thing as the so-called Thucydides trap in the world. But should major countries time and again make the mistakes of strategic miscalculation, they might create such traps for themselves."
    But western politicians keep raising this metaphor. So at some point they're like "okay we'll speak your language". They then used this metaphor to make the point "our rise isn't the threat, your fear of it is. If you resist it you're walking right into the trap Thucydides warned about". So your conclusion is still right, they don't want open hostilities, a stable world is in their interest.
    Then western media ran away with this and were like "OMG Xi mentioned the Thucydides Trap", completely ignoring his point.
    colordrops2 hours ago
    So the marketplace is working.
    abc123abc1232 hours ago
    This is the way! Open source models will benefit, and once open source models reach the state of "good enough" the hyped up US AI companies will fear, since the availability of free, good enough, AI models will set the ceiling for how much they can charge. Then the bubble will pop.
    VorpalWay22 minutes ago
    You mean open weights, I guess? There are as far as I know very few open source models, the training data is seldom released. Sadly.
    skeledrew37 minutes ago
    Regardless of where they are, the Chinese will always share their progress, as they're collectivist/cooperative at their core, compared to the individualistic/competitive US.
  - baxtran hour ago
    Who is financing DeepSeek and what are they expecting in return?
    nmfishera few seconds ago
    Until recently, DeepSeek were self-financed (it was a spin-out from a hedge fund). They just raised ~50million RMB (US$7bn), and according to media [0] (which admittedly can be unreliable), the lead investors were:
    1) The CEO himself 2) Tencent 3) CALT (the battery company) 4) NetEase (internet/media company) 5) JD.com (ecommerce) 6) Chinese investment firms
    What are they expecting in return? I'd say the same thing that all those investors in OpenAI and Anthropic are expecting - profit.
    [0] https://finance.sina.com.cn/stock/vcpe/2026-06-11/doc-iniazi...
    gniva few seconds ago
    I don't think this question would get to the reason. There could be one or two persons in charge who simply shape the culture of the company, including how much to publish.
    bushido19 minutes ago
    Likely to promote that China believes in free markets and making the technology available to all.
    Which will likely help them bolster the sales of the MANY new AI chips in development/use in China to international markets. Dislodging Nvidia.
    Kinda the opposite of what Jensen Huang (Nvidia) thinks US is doing: https://www.youtube.com/shorts/u3SY8nvjhQA
    panny27 minutes ago
    Short AI companies
    ???
    Profit!
    Not suggesting this is it, but you know, one possible angle.
    archerxan hour ago
    They are self financed, the company that makes DeepSeek is a finance company that trades on the markets.
    rsanekan hour ago
    The CCP's approach has historically been to subsidize their companies far more than other countries do. Why would LLMs be any different?
    https://www.oecd.org/en/data/dashboards/magic-database-indus...
    baxtr42 minutes ago
    Even if they were fully self-financed, which isn’t the case, they would expect something in return.
  - cromka2 hours ago
    I seriously am far from fear mongering and doomsday mentality, but I just can't see how OpenAI and Anthropic can have a successful IPO if the quality gap between the free and paid continues to narrow like that...
    cyanydeez2 hours ago
    fascism. it works be corporate fascism.
    28383838382 hours ago
    this place might as well be fucking reddit nowadays
    speed_spread29 minutes ago
    Yet accumulation of power by a very small elite through state and selected corporations happens to be a defining characteristic of that political regime.
    cyanydeezan hour ago
    you're right, full of corporate sock puppets shilling their vapor wares, idly dreaming that the world isn't what it is.
    720273729202 hours ago
    [dead]
  - spacebaconan hour ago
    [dead]
  - budsniffer9522 hours ago
    Do you think that DeepSeek are building their models for free, or something? They aren't "on the hook" for anything?
    What's with all the China glazing about this stuff? They release some open-source work and people act like they are suddenly the beacon of freedom and transparency.
    abc123abc1232 hours ago
    This is incorrect binary thinking. Them releasing open source can be good, but that does not commit you to think that china or chinese companies are saints. There are many shades of grey here and one does not exclude the other (nor include it).
    budsniffer952an hour ago
    Are you reading the comments?
    1341529 minutes ago
    I think there are some sockpuppet accounts active but what also contributes is that many people are absolutely fed up with US technological hegemony and welcome alternatives to core technologies from elsewhere.
    7speter2 hours ago
    I’m think its in our best interests to lever these american ai companies to exhibit at least some degree of freedom and transparency anyway we can…
- garn81024 minutes ago
  Yep. It's about time western world realized Chinese are not the "very bad guys under dictatorship"
  - 3abiton18 minutes ago
    Honestly it's just a hierarchy difference between the two countries. In the US, tech/fin/military companies have the upper hand compared to the government (fragmented between 2 parties). Despite the sharades with Anthropic, Tech-fluencers are in control. Compared to china, the government (dictatorship) has more control over Tech companies (take any example from the past 10 years). For them, undermining the US AI supremacy is an objective, and releasing open weight models is the way, and I'm all for it.
- herodoturtle2 hours ago
  Publishing by necessity I wonder? American labs on the cutting edge pioneering the way forward, so Deepseek open sourcing what they’ve got is to help even the playing field.
  Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.
  - try-workingan hour ago
    Yes, challenger Labs publish out of necessity. It is a marketing strategy. People assuming open source means giving something up, but the reality is that Z.ai has a revenue of some $100M and it would be about $0M if they never open sourced their models.
  - skeledrewan hour ago
    > Publishing by necessity
    It's more a cultural thing. Sharing progress is just in their blood.
  - jonplackett2 hours ago
    Wouldn’t that just help the American labs anyway though? Or do they assume they’ve actually already figured this stuff out and kept it secret?
    vintermannan hour ago
    It used to be the case that NSA hired the majority of all math graduates in the US, and were assumed to be years ahead in cryptography. Yet in the 90s, it became clear that they no longer were that - among other things, the cipher of the notorious Clipper chip was broken, and we can rule out that it was made weak on purpose because the whole point of Clipper was that they had a backdoor.
    So, despite hiring the cream of the crop of math graduates, who could read the papers of free academia, but whose own result the free world could not access - they fell behind.
    I have a theory explaining why. I think it's because science is an interactive process. NSA cryptographers could read papers, but they couldn't talk openly with the authors of those papers, because of secrecy demands - even asking question might indicate what they were working on. You can easily imagine them spending months on something they could have avoided by going to the original authors and getting told "Oh, we tried that for a long time, it doesn't work".
    Whether that theory is right or not, cryptography is a concrete example of a domain where public research with fewer resources beat private research with a lot more resources.
    7speteran hour ago
    From what I gather, the Chinese are behind, but a lot of their research amounts to scrappy, clever discoveries in how to use more novel technologies (for Qwen and Deepseek, its mixture of expert models, that can do inference using a portion of the model at a time). The chinese also distill information from American models, so there’s that.
    The American companies, from my impression don’t involve themselves with such lowly “hacks” because they have so much money to just push forward with doing everything on big heavy models that run on the most cutting edge nvidia chips that they can, the moment, kinda sorta get on demand (I say that in some degree of jest).
  - _0ffh2 hours ago
    I'm afraid I'm even balking at the word "pioneering" in context with US frontier labs. They are probably doing a few new things, right, but they are not blazing any trails for others to follow along, the Chinese are.
  - epolanski2 hours ago
    Chinese papers and techniques have been very influential and copied by US labs.
    Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.
- darkoob1241 minutes ago
  Google and Microsoft publish more than enough and American universities are publishing the science beyond DeepSeek's engineering. That fact that you don't know about them means you're not following the science only reading hacker news.
- utopiah39 minutes ago
  It's almost as if ... they were what OpenAI was when it started. Sad to see but glad someone is doing is.
- epolanski2 hours ago
  R1 was very influential on US models development.
  - an hour ago
    undefined
- rvz2 hours ago
  Exactly. They did not have to open up their research up and this is what happens when smart researchers are forced to squeeze performance gains out of existing hardware.
  They don't have TPUs or access to the latest Vera Rubin GPUs either to get performance gains for free. All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level.
  Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.
  - vidarh2 hours ago
    > Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.
    It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.
  - yorwba2 hours ago
    Anthropic almost certainly also has optimized software down to the assembly level, considering this take-home interview challenge they published: https://github.com/anthropics/original_performance_takehome/... which is all about instruction-level performance optimizations. That they don't prioritize UI fixes just means they consider other things more important.
    lelanthranan hour ago
    Unlikely: that product is written completely by AI, of which they are not lacking.
    More likely is that an AI generated codename is impossible to fix by humans, and SOTA was not able to figure it out until now.
    lionkoran hour ago
    that's pretty silly to use as a measure of what they do internally
- OtomotO36 minutes ago
  The difference between greed and power
- dakollian hour ago
  Its because our culture worships pieces of paper the government tells us is worth something.
  - mordae14 minutes ago
    Nope, people seek it out because government tells them to pay taxes _or else_.
  - IAmGraydonan hour ago
    Money is just a physical representation of the ability to get what you want. The problem is not money. It’s the fact that we live in a “me” society.
- jmyeet2 hours ago
  Chinese companies (and labs) operate in conjunction with the CCP so whatever they're doing, it's because it's Chinese state policy.
  What became clear when DeepSeek came onto the scene was that China was seeking to commoditize LLMs. They consider it an issue of national security not to be beholden to US tech companies when it comes to AI. And I, for one, fully endorse this policy.
  Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.
  I believe that OpenAI in particular is a bet on a trillion dollar pot of gold that doesn't exist. Google, Microsoft, Amazon and Meta will all be fine. Anthropic is in a far better position than OpenAI (IMHO) but if DeepSeek or some other Chinese open weight model gets as good at coding, they're in real trouble too.
  [1]: https://news.ycombinator.com/item?id=48667495
  - anon373839an hour ago
    I don’t see how Anthropic is in a better position. They have a slight edge in model quality right at a time when we’re getting a taste of what cheap, “good enough” AI looks like. They don’t own their own compute. And their own arrogance and lies have alienated a huge chunk of their customer base and alerted everyone to the dangers of being dependent on them.
    jmyeet40 minutes ago
    I personally think not owning their own compute is going to be an advantage.
    There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?
    Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.
    Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.
    While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.
    IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.
  - tw1984an hour ago
    > Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.
    anyone with IQ higher than 130 (thus qualified for actual AI R&D) would be questioning something obvious here -
    if they are already doing such dodgy stuff with the aim to maximize profits, why would those resellers have large amount of logs with actual American model responses to sell to those AI labs in the first place. shouldn't they just post train & customize some leading Chinese open source models to pretend to be Opus or GPT for the vast majority of their users (as classified by some models) who don't know much about expected Opus behaviours & not skilled enough to tell the differences?
    that is actually the interesting bit not covered in your censored version of the story line, it is also what happens on the ground. your censored version of the story implies that those dodgy resellers using stolen credit cards, pooling accounts with stolen IDs and illegally selling very personal logs would somehow be honest enough to spend extra $ to ensure their victims (aka paying users) can actually use real Opus and GPT. LOL
    dude, you failed this IQ test miserably.
    jampekkaan hour ago
    The galaxy brains in the labs putatively buying the logs wouldn't notice this? Or figure out a structure to prevent this?
    tw198440 minutes ago
    resellers wouldn't be trying to sell such junk in the first place. they use faked models to avoid the cost of Opus tokens, not to double dip to scam those with arguably the highest IQ in the country.
    an hour ago
    undefined
- DivingForGoldan hour ago
  Sure, in part by "stealing" from American AI companies with Distillation attacks:
  https://yipzap.com/anthropic-accuses-alibaba-of-largest-ai-d...
  - pennomi42 minutes ago
    If your moat is “please don’t copy my outputs”, you don’t have a moat. There is no such thing as a distillation “attack”.
    steinvakt235 minutes ago
    How does it differ from pirating music or movies?
    ReptileMana few seconds ago
    That when I pay for a model, the copyright of the output belongs to me. This is as work for hire as it gets.
    pornel18 minutes ago
    Machine-extruded text is not copyrightable, since there was no human creativity involved in producing it.
    (and if you argue the US models do produce copyrighted works, then oooops - whose copyright is it huh?)
    bethekidyouwant17 minutes ago
    Ow my head.
  - Jonnerz18 minutes ago
    US AI companies trained their own models on vast amounts of copyrighted and publicly available content without obtaining permission. There's no moral high ground here.
StizzurpXDD5 minutes ago
DeepSeek is, as I feel currently, the sole AI company which is actually trying to innovate rather than top mere benchmarks. Others like OpenAI, Anthropic and Google are mostly just competeing with each rather than keep innovating around the clock.
- Alifatiska minute ago
  > DeepSeek is, as I feel currently, the sole AI company which is actually trying to innovate rather than top mere benchmarks.
  I'd also include the other Chinese labs like Moonshot (behind Kimi) and Z.ai (behind GLM). They are innovating and continue openly sharing their research to the public. I believe the founder of Moonshot even shared 40 minute video on Twitter where he goes through techniques that powers Kimi.
kamranjon2 hours ago
The hugging face models are already up and seem to be the original models with the speculative decoding module built in which is very cool:
Flash: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-DSpark
Pro: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
Excited to see if this makes it into DwarfStar for local inference, have been using the flash model extensively since the 2-bit quants were made available by antirez.
piterrro3 hours ago
I’ve been using DeepSeek v4 pro for a month now in Kilo Code and its great. Fast, reliable, large context window and cheap as… Did 1,5B tokens this month and cost me 40usd (majority cached, but still).
- richardlblair21 minutes ago
  I've been using omp with deepseek as my task and quicktask agents, and sonnet as everything else.
  It's drastically reduced my AI spend. I went from spending $40/day to $10/day.
- spiderfarmer3 hours ago
  Is there a way to see how many tokes one does with claude code (pro)?
  - bpavuk2 hours ago
    the casino has no clocks, as one HN user put it some time ago.
    I second ccusage, it's nice
  - cptchaos2 hours ago
    https://ccusage.com/
  - edg50002 hours ago
    It's in the JSONs in ~/.claude, but last 30 days only I think. You can have the model analyze history. So for correct history you'd need to run history analysis on a cron job or something. Kinda hacky.
Havoc3 hours ago
Nice.
Guessing the timing isn't accidental. Demonstrated openness vs harsh regulation
pokot02 hours ago
I am wondering if this is why they can offer their pro model at ~1/4th of the price compared to the other providers offering the same model, and if other providers will be able to do the same in a short timeframe.
- sfifs23 minutes ago
  Inference I estimate runs 90% plus gross margins. Just work out the math on these servers. I am pretty sure any player can price down. It wouldn't look good on an IPO prospectus.
- sschueller2 hours ago
  I have been heavily using DeepSeek V4 Pro at Max for a month now and I would say it is 100x cheaper. If I pay for Claude I will hit that limit so fast I am always waiting 5 hours. Using the frontier models at Kilo I go through dollars while doing the same thing via DeepSeek it is pennies.
  - ddxvan hour ago
    I believe the comment you replied to was talking about the cost on providers like OpenCode vs Deepseek API. Deepseek API is even cheaper than the other providers for the same deepseek models.
- vidarh2 hours ago
  It'd presumably help a lot, but also when you use their endpoint they get more training data.
  - nicce2 hours ago
    This applies to every provider. OpenAI seems to be the worst hoarder.
    pokot02 hours ago
    actually you can buy inference on third party providers that serve deepseek v4 pro with zero data retention (ZDR).
    niccean hour ago
    Only reliable way to have zero data retention is to self-host.
    LeBitan hour ago
    True. But at some point you got to close your eyes and take a step forward.
    It’s like with VPN providers. Is Mullvad actually collaborating with law enforcement? They very well could be. It is a calculated risk.
    Is DeepInfra actually logging and training or selling the logs? They could be.
    flipped43 minutes ago
    Mullvad has proved it doesn't collect. It's laughable to even suggest it.
    They have been raided multiple times, tons of audits, does bleeding edge research on privacy preserving tech, donates to GOS, etc etc. You don't see this kind of VPN company at all because none exists.
    nicce9 minutes ago
    With Mullvad the threat space is also different. Most of the data is end-to-end encrypted anyway with proven methods. With LLMs you can't do that yet.
  - epolanski2 hours ago
    US labs do it too.
    minraws36 minutes ago
    Name any 2 or 3 that published bleeding edge research and similar in the last 6 months.
    Well I can't think of even one at the moment, to be honest might be biased but all Chinese research labs are largely oss except Alibaba now.
    I am certain there are lots of American labs that claim to do it, but either they are marketting in hype since they aren't even close to the frontier or contrarily just don't make anything of significant value public/oss.
  - flippedan hour ago
    US labs are the biggest data broker in the current history. They collect everything, dumb fuck.
ricardobeat3 hours ago
Presumably this has been in production for a while, and is one of the reasons they were able to dramatically lower prices a month ago?
- chronograman hour ago
  Yes. Section 5 talks about real-world deployment: 5.1: "The DSpark draft models are co-deployed with the preview versions of DeepSeek-V4-Flash and DeepSeek-V4-Pro"; 5.4: "MTP-1 represents the former production setup, having been superseded by DSpark two weeks following the DeepSeek-V4-preview release."
- _0ffh2 hours ago
  Lookahead Sparse Attention should be playing a big role as well, as it dramatically slashes memory consumption.
Jackobrien3 hours ago
I see a world soon where there’s an extremely wide variety of small models for speculative decoding, unique to use cases, companies, and even individuals.
- nicce3 hours ago
  Hopefully that is the case and hardware does not get impossible to get.
- pydry2 hours ago
  yes, heavily constrained by sophisticated guardrails.
  this is definitely where things are going. the enormous "eat the world" models have extreme diminishing returns by comparison.
lelanthranan hour ago
These companies providing tokens, whether SOTA or not, that want to IPO are so fucked as time goes on.
Can't sell their SOTA models, only slightly better than the open source models for the models they can sell, cost 20x to 50x for good models, a TAM that consists almost solely of developers, with no customer of theirs actually boasting increased profits as a result of AI...
I fear their time to IPO may have passed.
- utopiah35 minutes ago
  The question is even, was there EVER a time for an IPO?
  If the business model requires hundreds of billions to get the required quality (R&D but also infrastructure to collect data and train, either purchased or rented to 3rd party) while "only" dozens of billions can be earned back (as costs still exist to earn, it's not free once models are trained), then maybe there NEVER was nor till be a good time for an IPO in a rational market.
  - 283838383822 minutes ago
    IPOs with massive bags can be wework or spacex, it all depends on vibes. If they buy a couple more articles doomposting and glazing AI on the financial times right before exit they will def find a bunch of boomers to buy their bags. If the narrative changes before they IPO its over.
- an hour ago
  undefined
rvz3 hours ago
This is just one of many papers DeepSeek have released to be able to serve models at extremely cheap prices, unlike the others taking on >$100B+ of debt in building data centers for the same thing.
> As with V4-Flash, we treat this point as an indication that DSpark sustains useful throughput under an interactivity target that the baseline cannot efficiently support. At matched system capacities, DSpark delivers 57% to 78% faster per-user generation.
Reminds me of the flawed solution in scaling servers in 2017 that use memory-intensive technologies by adding even more servers to solve the problem. (It just increases costs.)
Rather than doing that, think about which critical parts of your app can be written in a more performant technology.
Fast forward to 2026, now you can see who is just throwing more money at the problem to create even more problems where as DeepSeek is giving us optimized solutions.
I know exactly who I would pay attention to, and it is absolutely not Anthropic.
- denverllc10 minutes ago
  For so long American companies have operated under the assumption that servers are cheaper than developers, and that was used to justify all sorts of inefficient practices.
  The last year has shown that’s not true anymore (even for web servers).
danielabinav1602 hours ago
Would love to see these numbers reproduced on consumer GPUs, not just A100s.
- tommica2 hours ago
  Maybe somaday an 8gb videocard can be used for coding...
  - romanusrome41 minutes ago
    [dead]
bfleschan hour ago
At this point why can't someone produce a fridge or container-sized AI appliance based on legacy chips (12nm)? I imagine this would cover 80% of corporate use cases where you need to "google-in-a-box" functionality.
The state-of-the-art nanometer are impossible to achieve but if you have infinite solar energy during business hours does it really matter? Every company has a parking spot so this ASIC-like appliance could be as big as a shipping container.
If it could just run recent open models for a handful of users it would be such a nobrainer to buy.
- scrlkan hour ago
  See "exabox" from George Hotz: https://tinycorp.myshopify.com/products/exabox-preorder
  - flipped42 minutes ago
    No one's buying that shitbox.
- sixhobbitsan hour ago
  Nvidia is already selling exactly this I think, not sure when it's expected to ship
- benjiro2938 minutes ago
  The issue is that there are only so many fabs in the world that make memory. And if you want the good stuff, your easily going into 400 ~ 750b parameter models. That means at FP4 400 to 750GB memory.
  Did i mention there are only so many memory makers and they are all busy printing money with HBM memory?
  Intel is trying with Crescent Island, to make a 160GB GPU that uses LPDDR5X memory.
  HBM takes multiple times the resources to make vs basic DDR5 memory. So by going this route, you have more memory, with the disadvantage that its only 700GB/s. VS HBM pumping out Terrabyte numbers like its nothing.
  These cards is reasonably priced, may be good alternative to $10k 96GB Nvidia Blackwells... You give up on token generation (heavily memory dependent), for more memory to run larger models at home/office/company servers.
  The problem is, again, there are only so many memory makers and its not like the market is flooded with DDR5 memory anymore, as the big 3 moved a lot of production to HBM.
  Another approach is Sandisk making HBF ... Flash memory, like your typical NVME but designed around maximum speed. So instead of loading the models into expensive HBM memory, you use the benefits of density in Flash memory, to offload models into that. Cheaper, but slower... But it leaves your expensive HBM memory free for things like KV Cache, Active parameters, etc... So your model will be slower, but your hybrid using it. As in, faster then running a model from system memory with normal DDR memory, but not as fast as HBM.
  So yea, there is a lot in development to reduce the dependance of that resource eating HBM memory. For the wafer cost of 1GB HBM, you normally got 4GB normal memory. That is why the world supply of memory dropped. Not just the insane buying but be HBM is just very inefficient in wafer usage.
  Can we not use DDR4 production and create some kind of hybrid solution? Sure, but the big 3 moved away from DDR4 in favor of DDR5 a long time ago. We have competition from China with a mix of DDR4/DDR5, but they also need to scale up. Nobody expected to see a large part of the world production vanish into HBM...
  Even if its about DDR4 and older nodes, ironically, most companies had been moving away from DDR4. There is only so much wafer capability in the world, to the point that companies are moving to using DDR2 ... Yea, not a typo, like 2007 DDR2! for IOT devices etc, stuff that does not need fast memory. Because even DDR3 got too expensive for them.
  Its not like the old nodes are not used anymore ... Like that capacity was sitting idle. It was still in production making other stuff. The only real solution is that we need more fabs, and those take years to build. And the big 3 delayed investing in new fabs for a long time, unsure about the whole AI bubble stuff. Aka, they did not want to make a ton of fabs to end up with over capacity if the AI growth collapsed.
28383838382 hours ago
Must be wonderful to be on the board of OpenAi et al & their PE investors whilst China keeps blowing up these mines under their feet lmao. Luckily Korean pension funds will buy all the trash as usual but goddamn you gotta start moving quick or you are gonna need some serious AGI to show you how to offload those bonds
- ozgrakkurt2 hours ago
  Don’t worry they will sell all the hardware and data they acquired with their grift
- ForHackernews2 hours ago
  "We will build the machine-god and pray for it to pay for itself."
  - FridgeSeal2 hours ago
    Every day, the rate of “could post a picture of 40k tech priests and have it taken unironically” goes up, and it’s starting to get concerning.
preetham_rangu3 hours ago
do they use their OCR, or someone else?
imrozim2 hours ago
[dead]