lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Y2HXoHeFt+BJRb6n@yaz-fattaah>
Date:   Wed, 2 Nov 2022 02:36:16 +0000
From:   Yazen Ghannam <yazen.ghannam@....com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Greg KH <gregkh@...uxfoundation.org>, linux-edac@...r.kernel.org,
        linux-kernel@...r.kernel.org, tony.luck@...el.com, x86@...nel.org,
        Smita.KoralahalliChannabasappa@....com, mpatocka@...hat.com
Subject: Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when
 removing threshold blocks

On Wed, Oct 26, 2022 at 10:12:15PM +0200, Borislav Petkov wrote:
> On Wed, Oct 26, 2022 at 07:44:17PM +0000, Yazen Ghannam wrote:
> > 1) Apply the patch I submitted as a simple fix/workaround for the presented
> > symptom. I tried to keep it small and well described to be a stable backport.
> > Obviously I wrote it without knowing the shared kobject behavior isn't ideal.
> 
> We'll see.
> 
> > 2) Address the shared kobject thing.
> >    Here are some options:
> >    a. Only set up the thresholding kobject on a single CPU per "AMD Node".
> >    Technically MCA Bank 4 is "shared" on legacy systems. But AFAICT from
> >    looking at old BKDG docs, in practice only the "Node Base Core" can access
> >    the registers. This behavior is controlled by a bit in NB which BIOS is
> >    supposed to set. Maybe some BIOSes don't do this, but I think that's a
> >    "broken BIOS on legacy system" issue if so.
> 
> I guess we can do that. And I even think we have some code which finds
> out which the NBC is...
> 
> /me greps a bit:
> 
> ah, there it is: get_nbc_for_node() in arch/x86/kernel/cpu/mce/inject.c.
> 
> 
> >    b. Disable the MCA Thresholding interface for Families before 0x17.
> 
> Can't. It is user-visible and you don't know for sure whether someone is
> using it or not.
> 
> Believe me, I have been wanting to disable this thing forever. I've
> never heard of anyone using it and all the energy we put in it was for
> nothing. :-\
> 
> We could try to deprecate it, though, make it default=n in Kconfig and
> see who complains. And after a couple of releases, kill it.
> 
> >    This is an undocumented interface, 
> 
> Of course it is documented - it is in the old BKDGs.
> 
> > and I don't know if anyone is using it on older systems.
> 
> Yap.
> 
> > The issue we're discussing here started because of a splat during
> > suspend/resume/CPU hotplug. In disable_err_thresholding(), we disable
> > MCA Thresholding for bank 4 on Family 15h, so there's some precedent.
> > c. Do nothing at the moment. I *really* want to clean up the MCA
> > Thresholding interface, and the shared kobject thing may get resolved
> > in that.
> 
> Clean it up how exactly?
> 
> Put it behind a Kconfig item, disable it and remove it after a while?
> 
> :-)
> 
> If so, I wouldn't mind. No one's using this. At least I haven't heard of
> a single bug report or of a use case. Only when CPU hotplug explodes and
> that thing is involved, only then.
> 
> Might as well remove it. And then remove it in the hardware too. RAS
> folks would love to get rid of some of that crap which takes up verif
> resources for no good reason.
> 
> :-)
>

Cool beans. I think this'll be a long process, so let me start by removing the
shared bank stuff. Thanks!

-Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ