lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190410172539.GF26580@zn.tnic>
Date:   Wed, 10 Apr 2019 19:25:39 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     "Ghannam, Yazen" <Yazen.Ghannam@....com>
Cc:     "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way

On Wed, Apr 10, 2019 at 04:58:12PM +0000, Ghannam, Yazen wrote:
> Yes, unused banks in the middle are counted in the MCG_CAP[Count] value.

Good.

> Okay, so you're saying the sysfs access should fail if a bank is
> disabled. Is that correct?

Well, think about it. If a bank is not operational for whatever reason,
we should tell the user that.

> Does "disabled" mean one or both of these?
> Unused = RAZ/WI in hardware
> Uninitialized = Not initialized by kernel due to quirks, etc.
> 
> For an unused bank, it doesn't hurt to write MCA_CTL, but really
> there's no reason to do so and go through mce_restart().

Yes, but that bank is non-operational in some form. So we should prevent
all writes to it because, well, it is not going to do anything. And this
would be a good way to give feedback to the user that that is the case.

> For an uninitialized bank, should we prevent users from overriding the
> kernel's settings?

That all depends on the quirks. Whether we should allow them to be
overridden or not. I don't think we've ever thought about it, though.

Let's look at one:

        if (c->x86_vendor == X86_VENDOR_AMD) {
                if (c->x86 == 15 && cfg->banks > 4) {
                        /*
                         * disable GART TBL walk error reporting, which
                         * trips off incorrectly with the IOMMU & 3ware
                         * & Cerberus:
                         */
                        clear_bit(10, (unsigned long *)&mce_banks[4].ctl);


Yah, so if the user reenables those GART errors, then she/he will see a
lot of MCEs reported and will maybe complain about it. And then we'll
say, but why did you enable them then. And she/he'll say: uh, didn't
know. Or, I was just poking at sysfs and this happened.

Then we can say, well, don't do that then! :-)

So my current position is, meh, who cares. But then I'm looking at
another quirk:

        if (c->x86_vendor == X86_VENDOR_INTEL) {
                /*
                 * SDM documents that on family 6 bank 0 should not be written
                 * because it aliases to another special BIOS controlled
                 * register.
                 * But it's not aliased anymore on model 0x1a+
                 * Don't ignore bank 0 completely because there could be a
                 * valid event later, merely don't write CTL0.
                 */

                if (c->x86 == 6 && c->x86_model < 0x1A && cfg->banks > 0)
                        mce_banks[0].init = 0;


which basically prevents that bank from being reinitialized. So I guess
we have that functionality already - we simply need to pay attention to
w->init.

Right?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ