Linux Internals: How /proc/self/mem writes to unwritable memory (2021)(offlinemark.com)

124 pointsby medbar2 days ago6 comments

hansendc2 days ago
"On x86-64, there are two CPU settings which control the kernel’s ability to access memory."
There are a couple more than two, even in 2021.
Memory Protection Keys come to mind, as do the NPT/EPT tables when virtualization is in play. SEV and SGX also have their own ways of preventing the kernel from writing to memory. The CPU also has range registers that protect certain special physical address ranges, like the TDX module's range. You can't write there either.
That's all that comes to mind at the moment. It's definitely a fun question!
- karlgkka day ago
  a thought: do MPK actually control the kernel's ability to access memory? on intel, i think if you try to read that memory, a page fault wont be thrown. although with PKS, kernel reads will cause a page fault.
  so can the kernel (ring0) freely read/write to memory encrypted with MPK? I think so, yes. good luck with whatever happens next tho lol
  - als0a day ago
    There are two versions of MPK. One is only applicable to userspace pages. The other is newer and can be applied to kernel space pages; last time I checked, this was only available on newer Xeon processors.
    By the way, MPK memory is not encrypted. The key is just an identifier for the requestor. If the requestor key doesn’t match the same identifier for the memory page, then an exception is raised.
    Funnily enough, MPK isn’t new at all. It’s almost a reintroduction of a feature from Itanium.
    karlgkk14 hours ago
    Aw, so I was half right. I knew the newer one, which is MPS, will throw a page fault. Sorry, it’s been a while since I’ve done this stuff and we were mostly working with tz
aliceryhla day ago
Interesting. Though looking at the code, it does still check VM_MAYWRITE, so the mapping needs to be something you could remap as writable.
KenoFischer2 days ago
I'm still surprised I was the first one to notice when Linus tried to change this - I always thought it was a pretty well known behavior.
anthka day ago
/proc it's a bad imitation of plan9's /proc.
bluepeter2 days ago
The kernel owns the page tables. It can always find another way in.
- vlovich123a day ago
  But the point here is that userspace can use this to bypass kernel protections that would otherwise prevent it from mutating R^X pages for example, not that the kernel can bypass its own.
  - im3w1la day ago
    Those protections are mainly about preventing well intentioned people from accidentally shooting themselves in the foot though, right? So it's not really a big deal that there is a way around it.
    jcalvinowensa day ago
    No, page table write access allows arbitrary memory access because I can map any PFN I want. It's certainly a vector to execute arbitrary code in ring 0.
    vlovich123a day ago
    It’s a huge deal. It’s a trivial gadget for building a larger exploit chain
- mschuster912 days ago
  > The kernel owns the page tables.
  not entirely, IOMMU is a thing, that is IIRC how Amazon and other hyperscalers can promise you virtual machines whose memory cannot be touched even in the case the host is compromised (and, by extension, also if the feds arrive to v& your server).
  - gruez2 days ago
    >how Amazon and other hyperscalers can promise you virtual machines whose memory cannot be touched even in the case the host is compromised (and, by extension, also if the feds arrive to v& your server).
    Even if we take those promises at face value, it practically doesn't mean much because every server still needs to handle reboots, which is when they can inject their evil code.
    Borealid2 days ago
    MK-TME allows having memory encrypted at run time, and the platform TPM signs an attestation saying the memory was not altered.
    Malicious code can't be injected at boot without breaking that TPM.
    fc417fc8022 days ago
    Subject to the huge caveat that the attacker does not have physical access. https://tee.fail/
    matheusmoreira16 hours ago
    This is excellent. The ability to trick remote servers into believing our computers are "trusted" despite the fact we are in control will be a key capability in the future. We need stuff like this to maintain control over our computers.
    Borealida day ago
    An interesting implementation flaw, but not a conceptual problem with the design.
    fc417fc802a day ago
    Well, it kind of is actually. The previous iteration of the design didn't have that vulnerability but it was slower because managing IVs within the given constraints adds an additional layer of complexity. This is the pragmatic compromise so to speak.
    Does it count as a conceptual problem when technical challenges without an acceptable solution block your goal?
    a day ago
    undefined
  - ronsor2 days ago
    If your threat model is being v& by feds, maybe you should keep your server at home behind Tor.
    matheusmoreira16 hours ago
    Proper OPSEC dictates that the server be located as far away from home as possible, ideally in a location with zero ties to your person.
    iberatora day ago
    Hosting tor outbound server at home is stupid idea.
    Your home is gonna be raided by Police and you will wait months or year to get your shit back and then if nothing, gonna be charged for having pirated windows and Photoshop lol
    real story
    r_leea day ago
    lmao please tell more
    mschuster91a day ago
    Not even two years ago, see https://www.golem.de/news/nach-hausdurchsuchung-deutscher-to...
    And it's not just a one off occurrence either. Tor exit node operators getting v& has been a thing for decades: https://www.heise.de/news/Anonymisierungsserver-bei-Razzia-b...
    mschuster91a day ago
    These days, every American's threat model should include being v& by the feds, and here in Germany, the situation isn't much better, you can get v& for saying the Minister of Interior is a dick [1].
    Yes, this was later on ruled unconstitutional, but it doesn't change the facts, and, worse, Germany doesn't have a "fruit of the forbidden tree" rule.
    [1] https://www.spiegel.de/panorama/justiz/hamburg-wohnungsdurch...
- pjmlpa day ago
  Not really, of the security measures on Windows, is exactly to control how kernel can access secure process memory, as possible mitigation to attacks by rogue drivers.
  Naturally it is the kind of stuff that requires Windows 11 vlatest with the nice Pluton security CPU, as part of CoPilot+ PCs design.
haberman2 days ago
TL;DR: when a user writes to /proc/self/mem, the kernel bypasses the MMU and hardware address translation, opting to emulate it in software (including emulated page faults!), which allows it to disregard any memory protection that is currently setup in the page tables.
- IAmLiterallyABa day ago
  It doesn't bypass it exactly, it's still accessing it via virtual memory and the page tables. It's just that the kernel maintains one big linear memory map of RAM that's writable.
- rramadass2 days ago
  Thank You.