[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SN6PR12MB2639571E33EBC7342A0607F8F8070@SN6PR12MB2639.namprd12.prod.outlook.com>
Date: Tue, 21 May 2019 17:52:42 +0000
From: "Ghannam, Yazen" <Yazen.Ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
CC: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"bp@...e.de" <bp@...e.de>,
"tony.luck@...el.com" <tony.luck@...el.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu
> -----Original Message-----
> From: Borislav Petkov <bp@...en8.de>
> Sent: Saturday, May 18, 2019 6:26 AM
> To: Ghannam, Yazen <Yazen.Ghannam@....com>
> Cc: linux-edac@...r.kernel.org; linux-kernel@...r.kernel.org; bp@...e.de; tony.luck@...el.com; x86@...nel.org
> Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu
>
>
> On Tue, Apr 30, 2019 at 08:32:20PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <yazen.ghannam@....com>
> >
> > The number of MCA banks is provided per logical CPU. Historically, this
> > number has been the same across all CPUs, but this is not an
> > architectural guarantee. Future AMD systems may have MCA bank counts
> > that vary between logical CPUs in a system.
> >
> > This issue was partially addressed in
> >
> > 006c077041dc ("x86/mce: Handle varying MCA bank counts")
> >
> > by allocating structures using the maximum number of MCA banks and by
> > saving the maximum MCA bank count in a system as the global count. This
> > means that some extra structures are allocated. Also, this means that
> > CPUs will spend more time in the #MC and other handlers checking extra
> > MCA banks.
>
> ...
>
> > @@ -1480,14 +1482,15 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);
> >
> > static int __mcheck_cpu_mce_banks_init(void)
> > {
> > + u8 n_banks = this_cpu_read(mce_num_banks);
> > struct mce_bank *mce_banks;
> > int i;
> >
> > - mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
> > + mce_banks = kcalloc(n_banks, sizeof(struct mce_bank), GFP_KERNEL);
>
> Something changed in mm land or maybe we were lucky and got away with an
> atomic GFP_KERNEL allocation until now but:
>
> [ 2.447838] smp: Bringing up secondary CPUs ...
> [ 2.456895] x86: Booting SMP configuration:
> [ 2.457822] .... node #0, CPUs: #1
The issue seems to be that the allocation is now happening on CPUs other than CPU0.
Patch 2 in this set has the same issue. I didn't see it until I turned on the "Lock Debugging" config options.
> [ 1.344284] BUG: sleeping function called from invalid context at mm/slab.h:418
This message comes from ___might_sleep() which checks the system_state.
On CPU0, system_state=SYSTEM_BOOTING.
On every other CPU, system_state=SYSTEM_SCHEDULING, and that's the only system_state where the message is shown.
Changing GFP_KERNEL to GFP_ATOMIC seems to be a fix. Is this appropriate? Or do you think there's something else we could try?
Thanks,
Yazen
Powered by blists - more mailing lists