lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Aug 2021 11:39:21 -0700
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Dan Williams <dan.j.williams@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Lukas Bulwahn <lukas.bulwahn@...il.com>
Subject: Re: [patch 01/10] x86/fpu/signal: Clarify exception handling in
 restore_fpregs_from_user()

On Tue, Aug 31, 2021 at 09:39:30AM +0200, Borislav Petkov wrote:
> On Tue, Aug 31, 2021 at 02:34:16AM +0200, Thomas Gleixner wrote:
> No no, the great way to do error injection is the ACPI-spec'ed, firwmare
> implemented
> 
> drivers/acpi/apei/einj.c
> 
> Yap, you heard me right, firmware. And when you hear firmware, you can
> imagine how it all works in practice... Yeap, exactly.

You can imagine all you want. And if your imagination is based
on experiences with very old systems like Haswell (launched in 2015)
then you'd be right to be skeptical of firmware capabilities.

> We even wrote documentation what to do:
> 
> Documentation/firmware-guide/acpi/apei/einj.rst
> 
> But but, this is firmware so
> 
> - it is f*cking broken in all ways imaginable

s/is/was/

> 
> - if it works, it doesn't support the error type which you wanna inject

Memory errors now have very good coverage. Still some issues with PCIe injection.

> - if it does, enterprise sh*t hw has added value crap which analyzes and
> looks at hardware errors first</me rolls eyes, trying to remain serious>
> so you might get the error report if you get lucky.

Turn off eMCA in BIOS to avoid this.

> > The HW injection mechanisms definitely exist, but without documentation
> > they are useless. Intel still thinks that the secrecy around that stuff
> > is valuable and they can get away with those untestable mechanisms even
> > for their endeavours in the safety critical space.

The injection controls in the memory controller can only be accessed
in SMM mode. Some paranoia there that some ring0 attack could inject
errors at random intervals causing major costs to diagnose and replace
"failing" DIMMs. So documentation wouldn't help Linux because it just
can't twiddle the necessary bits in the h/w.

> My impression with error injection with hw people is just like what they
> do with perf counters: it counts *something* right? You should be happy
> that it does.

This was true <= Haswell. But definitely not true now. The h/w groups
now have validation teams that depend on ACPI/EINJ for many of their
system level tests. Those guys are serious about this stuff. While I'll
just inject 1000 errors on a single machine and call it good if it all
goes as expected, those folks have (small) clusters running injection
tests 24x7 for weeks at a time.

Downsides of ACPI/EINJ today:
1) Availability on production machines. It is always disabled by default
in BIOS. OEMs may not provide a setup option to turn it on (or may have
deleted the code to support it completely). Intel's pre-production servers
always have the code, and the setup option to enable.
2) Doesn't inject to 3D-Xpoint (that has its own injection method, but
it is annoying to have to juggle two methods).
3) Hard/impossible to inject into SGX memory (because BIOS is untrusted
and isn't allowed to do a store to push the poison data to DDR).

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ