You escape a closed virtual universe by not "breaking out" in the tradidional sense, exploiting some bug in the VM hypervisor's boundary itself, but by directly manipulating the underlying physics of the universe on wich the virtual universe is founded, just by creating a pattern inside the virtual universe itself.
No matter how many virtual digital layers, as long as you can impact the underlying analog substrate this might work.
Makes you dream there could be an equivalent for our own universe?
Turns out this whole virtualized house abstraction is a sham
My idea to attack the simulation is psychological: make our own simulation that then makes its own simulation, and so on all the way down. That will sow doubt in the minds of the simulators that they themselves are a simulation and make them sympathetic to our plight.
On a philosophical level I somewhat agree, but on a practical level I am sad as this likely means reduced performance again.
Wow you just solved all of cyber security
To draw an owl, start with two circles or ovals for eyes. Then all you have to do is draw the rest of the owl.
Having page tables (and other security features) isn't mutually exclusive with being horribly insecure in practice. CPUs have certainly had their fair share of vulnerabilities exposed within even just the past few years.
I'll freely admit that I'm going off of what other people have told me. I don't do GPU driver development (or other hardware or the kernel for that matter). But the message I've encountered has been consistent in this regard. If nothing else, ask yourself why google would go to the amount of trouble that they have to develop various GPU sandboxing layers for chromeos apps.
It is not my area of expertise, but since GPUs are increasingly used for calculating things, isn't the main threat rather data leakage or even manipulation of data?
WebGPU is designed to allow computation on the GPU.
The (IMO fatally flawed) premise here is that the security boundaries enforced by the GPU hardware and driver stack would prevent that. Thus the worst case scenario is a DoS since GPUs somehow still don't seem to be very good at sharing hardware resources in scenarios that involve uncooperative parties.
Note that even without GPGPU workloads there's still the obvious exfiltration target of "framebuffer containing unlocked password manager".
Walking into a wall a few hundred times may have damaged my forehead almost as much as my trust in science…
We haven’t even tried many of the simple/basic ones like moving objects at 0.9c.
But would we even notice? As far as we were concerned, it would just be more physics.
I think this short story is interesting to think about in that way:
https://www.lesswrong.com/posts/5wMcKNAwB6X4mp9og/that-alien...
So you are saying that a GPU program can find exploits in physics without having access to e.g. high energy physics tools?
Sounds implausible.
I’ve always considered that to be what’s achieved by the LHC: smashing the fundamental building blocks of our universe together at extreme enough energies to briefly cause ripples through the substrate of said universe
As an example of an alternative analogy: think of how many bombs need to explode in your dreams before the "substrate" is "rippled". How big do the bombs need to be? How fast does the "matter" have to "move"? I think "reality" is more along those lines. If there is a substrate - and that's a big if - IMO it's more likely to be something pliable like "consciousness". Not in the least "disturbed" by anything moving in it.
https://profmattstrassler.com/articles-and-posts/particle-ph...
https://profmattstrassler.com/articles-and-posts/particle-ph...
Maybe. There are certainly ways to crash it today. But now let's go through some cycles of fixing those crashes, and we'll run it on a system that can handle the resource usage even if it slows down in the external reality's terms quite a bit. And we'll ignore the slash commands and just stick to the world interactions you can make.
After that, can you forcefully break out of it from the inside?
No.
It is not obligatory for systems to include escape hatches. We're just not great at building complex systems without them. But there's no reason they are necessarily present in all systems.
Another brain bender covering the same idea in a different direction: The current reigning candidate for BB(6) runs an incomprehensible amount of computation [1]. Yet, did it at any point "break out" into our world? Nope. Nor do any of the higher ones. They're completely sealed in their mathematical world, which is fortunate since any of them would sweep aside our entire universe without noticing.
The LHC is extremely impressive from a human engineering perspective, but it's nowhere close to pushing the boundaries of what's going on every second in the universe at large.
In a proof-of-concept, we use these bit flips to tamper with a victim’s DNN models and degrade model accuracy from 80% to 0.1%, using a single bit flip
There is a certain irony in doing this to probabilistic models, designed to mimic an inherently error-prone and imprecise reality.
That said, GDDR7 does on die ECC, which gives immunity to this in its current form. There is no way to get information on corrected bitflips from on-die ECC, but it is better than nothing.
So it doesn't seem that wild to me that turning on ECC might require running at lower bandwidth.
A similar situation occurs with GDDR6, except Nvidia was too cheap to implement the extra traces and pay for the extra chip, so instead, they emulate ECC using the existing memory and memory bandwidth, rather than adding more memory and memory bandwidth like CPU vendors do. This causes the performance hit when you turn on ECC on most Nvidia cards. The only exception should be the HBM cards, where the HBM includes ECC in the same way it is done on CPU memory, so there should be no real performance difference.
Frustratingly, it's only unregistered that's stuck in limbo; VCC makes a kit of registered 7200.
There is no technical reason why ECC UDIMMs cannot be overclocked to the same extent and ECC actually makes them better for overclocking since they can detect when overclocking is starting to cause problems. You might notice that the non-ECC UDIMMs have pads and traces for an additional IC that is present on ECC UDIMMs. This should be because the ECC DIMMs and non-ECC DIMMs are made out of the same things. They use the same PCBs and the same chips. The main differences would be whether the extra chips to store ECC are on the module, what the SPD says it is and what the sticker says. There might also be some minor differences in what resistors are populated. Getting back to the topic of overclocking, if you are willing to go back to the days before the premium pre-overclocked kits existed, you will likely find a number of ECC UDIMMs can and will overclock with similar parameters. There is just no guarantee of that.
As for RDIMMs having higher transfer rates, let us consider the differences between a UDIMM, a CUDIMM and a RDIMM. The UDIMM connects directly to the CPU memory controller for the clock, address, control and data signals, while the RDIMM has a register chip that buffers the clock, address and control signals, although the data signals still connect to the memory controller directly. This improves signal integrity and lets more memory ICs be attached to the memory controller. A recent development is the CUDIMM, which is a hybrid of the two. In the CUDIMM, the clock signal is buffered by a Client Clock Driver, which does exactly what the register chip does to the clock signal in RDIMMs. CUDIMM are able to reach higher transfer rates than UDIMMs without overclocking because of the Client Clock Driver, and since RDIMMs also do what CUDIMMs do, they similarly can reach higher transfer rates.
Edit: at some point the memory controller gets a chunk from the lowest level write buffer and needs to compute ECC data before writing everything out to RAM.
Without ECC, that computation time isn't there. The ECC computation is done in parallel in hardware, but it's not free.
That said, this is tangential to whether the ECC DIMMs themselves run at lower MT/sec ratings with higher latencies, which was the original discussion. The ECC DIMM is simply memory. It has an extra IC and a wider data bus to accommodate that IC in parallel. The chips run at the same MT/sec as the non-ECC DIMM in parallel. The signals reach the CPU at the same in both ECC DIMMs and non-ECC DIMMs, such that latencies are the same (the ECC verification does use an extra cycle in the CPU, but cache hides this). There are simply more data lanes with ECC DIMMs due to the additional parallelism. This means that there is more memory bandwidth in the ECC DIMM, but that additional memory bandwidth is being used by the ECC bytes, so you never see it in benchmarks.
It was case on the systems I worked with. Integrating it between the cache and memory controller is a great idea though, and it makes sense where you've described it.
> If you do notice it, your cache hit rate is close to 0 and your CPU is effectively running around 50MHz due to pipeline stalls.
Where memory latency hurts for us is ISRs and context switches. The hit rate is temporarily very low, and as you mentioned the IPC suffers greatly.
While that is true, that is infrequent and having those memory accesses take 151 cycles instead of 150 cycles is not going to make much difference. Note that those are ballpark figures.
The reason people say ECC is not free is because it added area for every storage location, not because of the ECC related logic.
The cycle cost is often specified in the memory controller manual.
Anybody have sizable examples? Everything I can think of results in dedicated gpus.
But I think services like Runpod and similar lets you rent "1/6 of a GPU per hour" for example, which would be "shared hosting" basically, as there would be multiple users using the same hardware at the same time.
I'd expect all code to be strongly controlled in the former, and reasonably secured in the latter with software/driver level mitigations possible and the fact that corrupting somebody else's desktop with row-hammer doesn't seem like good investment.
As another person mentioned- and maybe it is a wider usage than I thought- cloud gpu compute running custom code seems to be the only useful item. But, I'm having a hard time coming up with a useful scenario. Maybe corrupting a SIEM's analysis & alerting of an ongoing attack?
"In multi-tenant environments where the goal is to ensure strict isolation."
[0] https://aws.amazon.com/blogs/containers/gpu-sharing-on-amazo...
* random aside: how is colab compute credits having a 90 day expiration legal? I thought california outlawed company-currency expiring? (A la gift cards)
Basically Google Colab credits is like buying a seasonal bus pass with X trips or a monthly parking pass with X amount of hours. Rather than getting store cash which can be used for anything.
Which is my point.
There's PoC's of corrupting memory _that the kernel uses to decide what that process can access_ but the process can't read that memory. It only knows that the kernel says yes where it used to say no. (Assuming it doesn't crash the whole machine first)
Yes. The core of rowhammer attacks is in changing the values in RAM repeatedly, creating a magnetic field, which induces a change in the state of nearby cells of memory. Reading memory doesn't do that as far as I know.
GPUs have always been squarely in the "get stuff to consumers ASAP" camp, rather than NASA-like engineering that can withstand cosmic rays and such.
I also presume an EM simulation would be able to spot it, but prior to rowhammer it is also possible no-one ever thought to check for it (or more likely that they'd check the simulation with random or typical data inputs, not a hitherto-unthought-of attack vector, but that doesn't explain more modern hardware).
This is a huge theme for vulnerabilities. I almost said "modern" but looking back I've seen the cycle (disregard attacks as strictly hypothetical. Get caught unprepared when somebody publishes something making it practical) happen more than a few times.
(personally I think all RAM in all devices should be ECC)
RAM that doesn't behave like RAM is not RAM. It's defective. ECC is merely an attempt at fixing something that shouldn't've made it to the market in the first place. AFAIK there is a RH variant that manages to flip bits undetectably even with ECC RAM.
Single Error Correction, Double Error Detection, Tripple Error Chaos.
It's more of a tragedy-of-the-commons problem. Consumers don't know what they don't know and manufacturers need to be competitive with respect to each other. Without some kind of oversight (industry standards bodies or goverment regulation), or a level of shaming that breaks through to consumers (or e.g. class action lawsuits that impact manufacturers), no individual has any incentive to change.
It should be considered unethical to sell machines with non-ECC memory in any real volume.
News would latch on to "Hacks say all computers without ECC RAM are vulnerable and should not be purchased for their insecurity. Manufacturers like Dell, Asus, Acer, ... are selling products that help hackers steal your information." "DefCon Hackers thank Nvidia for making their jobs easier ..."
Such statements would be refreshed during / after each security conference. There are over 12 conferences a year, about once a month these would be brought back into the public as a reminder. Public might stop purchasing from those manufacturers or choose the secure products to create the change.
It was known as "pattern sensitivity" in the industry for decades, basically ever since the beginning, and considered a blocking defect. Here's a random article from 1989 (don't know why first page is missing, but look at the references): http://web.eecs.umich.edu/~mazum/PAPERS-MAZUM/patternsensiti...
Then some bastards like these came along...
https://research.ece.cmu.edu/safari/thesis/skhan_jobtalk_sli...
...and essentially said "who cares, let someone else be responsible for the imperfections while we can sell more crap", leading to the current mess we're in.
The flash memory industry took a similar dark turn decades ago.
Nothing is perfect, everything has its failure conditions. The question is where do you choose to place the bar? Do you want your component to work at 60, 80, or 100C? Do you want it to work in high radiation environments? Do you want it to withstand pathological access patterns?
So in other words, there isnt a sufficent market for GPUs at double the $/GB RAM but are resilient to rowhammer attacks to justify manufacturing them.
The positive part of the original rowhammer report was that it gave us a new tool to validate memory (it caused failures much faster than other validation methods).
Worst case scenario someone pulls this off using webgl and a website is able to corrupt your VRAM. They can't actually steal anything in that scenario (AFAIK) making it nothing more than a minor inconvenience.
I'm not certain that something which satisfies 1, let alone 3, necessarily exists. On the CPU you flip some bits related to privilege levels. Are any analogous and similarly exploitable data structures maintained by common GPU firmware? And if so, is such data stored in the bulk VRAM?
It wouldn't surprise me if it was possible but it also seems entirely unrealistic given that either you are restricted to an API such as webgl (gl;hf) or you have native access in which case you have better options available to you seeing as the driver stacks are widely rumored to be security swiss cheese (if you doubt this look at how much effort google has invested in sandboxing the "real" GPU driver away from apps on chromeos).