linux-hardening - Software based row hammer mitigation for systems with memory error detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CAKO8emZtcfPk4PvFXkvV85Jx-cHh0Qq0R8bfpW1a97E+1BQXQg@mail.gmail.com>
Date:   Tue, 5 Sep 2023 14:27:22 +1000
From:   Dan Farrell <djfarrell@...il.com>
To:     linux-hardening@...r.kernel.org
Subject: Software based row hammer mitigation for systems with memory error detection

Hi!

As per the subject I have an idea for a software based mitigation for Row
Hammer type attacks, in particular on systems which use ECC or have other error
detection mechanisms.

I was just hoping to get the idea out there to see if there is any interest in
putting it in the kernel.

The idea is pretty simple, although I imagine implementation may be a pain.

ECCploit[1] uses a method of making a single bit error present in a page, and
then causing a read of that page and then observing that reading the page has
higher latency (due to ECC processing).

However, the kernel, upon detecting an ECC error, could copy that physical page
to another region/page, and remap the virtual page to point to the new physical
region/page.

So, further Row Hammering will be off target.

This could also be thought of like how a non-volatile drive will detect faulty
(or nearly faulty) blocks and use the set aside space on the drive for the
block in question.

Anyway, if this is a bad idea that's fine too. Full disclosure I did a noob
thing and initially sent this message to the security mailing list - but I did
get a reply from Linus who I think has reservations about such a technique:

On Tue, 5 Sept 2023 at 03:54, Linus Torvalds
<torvalds@...uxfoundation.org> wrote:
>
> On Mon, 4 Sept 2023 at 00:39, Dan Farrell <djfarrell@...il.com> wrote:
> >
> > However, the kernel, upon detecting an ECC error, could copy that physical page
> > to another region/page, and remap the virtual page to point to the new physical
> > region/page.
>
> Honestly, any hardware that makes correctable ECC errors synchronous
> is already broken.
>
> ECCploit is just one example of how broken that approach is, but it's
> just broken in general. The kernel doesn't want an immediate "you had
> a correctable error" machine check, and in fact in many situations
> cannot sanely deal with such a thing - very much including doing major
> surgery like moving pages around for it.
>
> Realistically, what we want for rowhammer is *detection* that
> hammering is going on. The exploit will take long enough that once you
> can detect it, the fix is to stop the exploit, not to try to do
> anything fancier.

Please let me know if it would be useful in the kernel. I could probably
eventually implement it (supposing it is implementable), but my only kernel
hacking experience is with drivers so I would be facing a steep learning
curve.

Regards,

Dan Farrell

[1] https://www.vusec.net/projects/eccploit/