lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 22 Jul 2023 10:38:13 +0200 (CEST)
From:   Mikulas Patocka <mpatocka@...hat.com>
To:     Yazen Ghannam <yazen.ghannam@....com>
cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Borislav Petkov <bp@...e.de>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Tony Luck <tony.luck@...el.com>, linux-kernel@...r.kernel.org,
        linux-edac@...r.kernel.org
Subject: Re: Warnings when suspending to disk

Hi

I upgraded the kernel to 6.4.4 and it still has this bug (see 
https://lore.kernel.org/lkml/YqD1YjeovGu28xsP@yaz-fattaah/T/ for the 
beginning of this thread).

The patch below fixes these warnings.

I'd like to ask you to submit the patch upstream, with

Fixes: 7f99cb5e6039 ("x86/CPU/AMD: Use default_groups in kobj_type")
Cc: stable@...r.kernel.org	# v5.18+
Tested-by: Mikulas Patocka <mpatocka@...hat.com>

Mikulas



On Sat, 11 Jun 2022, Mikulas Patocka wrote:

> 
> 
> On Wed, 8 Jun 2022, Yazen Ghannam wrote:
> 
> > On Fri, Jun 03, 2022 at 01:34:26PM -0400, Mikulas Patocka wrote:
> > 
> > ...
> > 
> > > I tried this patch and it doesn't help.
> > 
> > Thanks Mikulas for testing.
> > 
> > I'm still not able to reproduce the exact issue. But I was able to reproduce
> > the same symptom by hacking the kernel and doing CPU hotplug.
> 
> I also see the warnings when disabling cores.
> 
> > Can you please try the following patch? This seems to work in my hacked case.
> > I also tried to write out a detailed description of the issue to the best of
> > my knowledge.
> 
> This patch works - there are no longer any warnings on CPU disable or on 
> suspend to disk.
> 
> Mikulas
> 
> > Thanks,
> > Yazen
> > 
> > ========================
> > 
> > >From d1fa5cdc7f29bf810215f0a83f16bc7435e55240 Mon Sep 17 00:00:00 2001
> > From: Yazen Ghannam <yazen.ghannam@....com>
> > Date: Mon, 6 Jun 2022 19:45:56 +0000
> > Subject: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing
> >  threshold blocks
> > 
> > AMD systems from Family 10h to 16h share MCA bank 4 across multiple CPUs.
> > Therefore, the threshold_bank structure for bank 4, and its threshold_block
> > structures, will be initialized once at boot time. And the kobject for the
> > shared bank will be added to each of the CPUs that share it. Furthermore,
> > the threshold_blocks for the shared bank will be added again to the bank's
> > kobject. These additions will increase the refcount for the bank's kobject.
> > 
> > For example, a shared bank with two blocks and shared across two CPUs will
> > be set up like this:
> > 
> > CPU0 init
> >   bank create and add; bank refcount = 1; threshold_create_bank()
> >     block 0 init and add; bank refcount = 2; allocate_threshold_blocks()
> >     block 1 init and add; bank refcount = 3; allocate_threshold_blocks()
> > CPU1 init
> >   bank add; bank refcount = 3; threshold_create_bank()
> >     block 0 add; bank refcount = 4; __threshold_add_blocks()
> >     block 1 add; bank refcount = 5; __threshold_add_blocks()
> > 
> > Currently in threshold_remove_bank(), if the bank is shared then
> > __threshold_remove_blocks() is called. Here the shared bank's kobject and
> > the bank's blocks' kobjects are deleted. This is done on the first call
> > even while the structures are still shared. Subsequent calls from other
> > CPUs that share the structures will attempt to delete the kobjects.
> > 
> > During kobject_del(), kobject->sd is removed. If the kobject is not part of
> > a kset with default_groups, then subsequent kobject_del() calls seem safe
> > even with kobject->sd == NULL.
> > 
> > Originally, the AMD MCA thresholding structures did not use default_groups.
> > And so the above behavior was not apparent.
> > 
> > However, a recent change implemented default_groups for the thresholding
> > structures. Therefore, kobject_del() will go down the sysfs_remove_groups()
> > code path. In this case, the first kobject_del() may succeed and remove
> > kobject->sd. But subsequent kobject_del() calls will give a WARNing in
> > kernfs_remove_by_name_ns() since kobject->sd == NULL.
> > 
> > Use kobject_put() on the shared bank's kobject when "removing" blocks. This
> > decrements the bank's refcount while keeping kobjects enabled until the
> > bank is no longer shared. At that point, kobject_put() will be called on
> > the blocks which drives their refcount to 0 and deletes them and also
> > decrementing the bank's refcount. And finally kobject_put() will be called
> > on the bank driving its refcount to 0 and deleting it.
> > 
> > With this patch and the example above:
> > 
> > CPU1 shutdown
> >   bank is shared; bank refcount = 5; threshold_remove_bank()
> >     block 0 put parent bank; bank refcount = 4; __threshold_remove_blocks()
> >     block 1 put parent bank; bank refcount = 3; __threshold_remove_blocks()
> > CPU0 shutdown
> >   bank is no longer shared; bank refcount = 3; threshold_remove_bank()
> >     block 0 put block; bank refcount = 2; deallocate_threshold_blocks()
> >     block 1 put block; bank refcount = 1; deallocate_threshold_blocks()
> >   put bank; bank refcount = 0; threshold_remove_bank()
> > 
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> 
> Tested-by: Mikulas Patocka <mpatocka@...hat.com>
> 
> > ---
> >  arch/x86/kernel/cpu/mce/amd.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> > index 2b7ee4a6c6ba..680b75d23a03 100644
> > --- a/arch/x86/kernel/cpu/mce/amd.c
> > +++ b/arch/x86/kernel/cpu/mce/amd.c
> > @@ -1260,10 +1260,10 @@ static void __threshold_remove_blocks(struct threshold_bank *b)
> >  	struct threshold_block *pos = NULL;
> >  	struct threshold_block *tmp = NULL;
> >  
> > -	kobject_del(b->kobj);
> > +	kobject_put(b->kobj);
> >  
> >  	list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
> > -		kobject_del(&pos->kobj);
> > +		kobject_put(b->kobj);
> >  }
> >  
> >  static void threshold_remove_bank(struct threshold_bank *bank)
> > -- 
> > 2.25.1
> > 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ