lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y1mOEfEM6MdnV8CX@yaz-fattaah>
Date:   Wed, 26 Oct 2022 19:44:17 +0000
From:   Yazen Ghannam <yazen.ghannam@....com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Greg KH <gregkh@...uxfoundation.org>, linux-edac@...r.kernel.org,
        linux-kernel@...r.kernel.org, tony.luck@...el.com, x86@...nel.org,
        Smita.KoralahalliChannabasappa@....com, mpatocka@...hat.com
Subject: Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when
 removing threshold blocks

On Wed, Oct 26, 2022 at 08:29:51PM +0200, Borislav Petkov wrote:
> On Wed, Oct 26, 2022 at 03:39:15PM +0000, Yazen Ghannam wrote:
> > What's the issue with my original patch?
> 
> Do you see it?
> 
> > @@ -1258,10 +1258,10 @@ static void __threshold_remove_blocks(struct threshold_bank *b)
> >         struct threshold_block *pos = NULL;
> >         struct threshold_block *tmp = NULL;
> >  
> > -       kobject_del(b->kobj);
> > +       kobject_put(b->kobj);
> >  
> >         list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
> > -               kobject_del(&pos->kobj);
> > +               kobject_put(b->kobj);
> 
> You're basically putting the parent as many times as there are elements
> on the ->miscj list.
> 
> Basically what Greg doesn't like.
> 
> Him and I need to talk it over first whether my gross hack of grafting
> the bank4 kobject hierarchy from CPU0 onto the other CPUs on the node is
> even viable so stay tuned...
> 
> > I think this is the simplest way to fix the current implementation.
> > But we should probably get rid of this kobject sharing idea in light
> > of Greg's comments.
> 
> You said it. :)
> 
> Or maybe do a better one.

Right, so can we do the following two things?

1) Apply the patch I submitted as a simple fix/workaround for the presented
symptom. I tried to keep it small and well described to be a stable backport.
Obviously I wrote it without knowing the shared kobject behavior isn't ideal.

2) Address the shared kobject thing.
   Here are some options:
   a. Only set up the thresholding kobject on a single CPU per "AMD Node".
   Technically MCA Bank 4 is "shared" on legacy systems. But AFAICT from
   looking at old BKDG docs, in practice only the "Node Base Core" can access
   the registers. This behavior is controlled by a bit in NB which BIOS is
   supposed to set. Maybe some BIOSes don't do this, but I think that's a
   "broken BIOS on legacy system" issue if so.
   b. Disable the MCA Thresholding interface for Families before 0x17. This is
   an undocumented interface, and I don't know if anyone is using it on older
   systems. The issue we're discussing here started because of a splat during
   suspend/resume/CPU hotplug. In disable_err_thresholding(), we disable MCA
   Thresholding for bank 4 on Family 15h, so there's some precedent.
   c. Do nothing at the moment. I *really* want to clean up the MCA
   Thresholding interface, and the shared kobject thing may get resolved in
   that.

What do you think?

Thanks,
Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ