lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Feb 2024 09:21:02 -0800
From: Tony Luck <tony.luck@...el.com>
To: Sohil Mehta <sohil.mehta@...el.com>
Cc: Borislav Petkov <bp@...en8.de>, "Naik, Avadhut" <avadnaik@....com>,
	"x86@...nel.org" <x86@...nel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"yazen.ghannam@....com" <yazen.ghannam@....com>,
	Avadhut Naik <avadhut.naik@....com>
Subject: Re: [PATCH] x86/mce: Dynamically size space for machine check records

On Wed, Feb 28, 2024 at 05:56:26PM -0800, Sohil Mehta wrote:
> A few other nits.
> 
> On 2/28/2024 3:14 PM, Tony Luck wrote:
> > diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
> > index fbe8b61c3413..a1f0a8f29cf5 100644
> > --- a/arch/x86/kernel/cpu/mce/genpool.c
> > +++ b/arch/x86/kernel/cpu/mce/genpool.c
> > @@ -16,14 +16,13 @@
> >   * used to save error information organized in a lock-less list.
> >   *
> >   * This memory pool is only to be used to save MCE records in MCE context.
> > - * MCE events are rare, so a fixed size memory pool should be enough. Use
> > - * 2 pages to save MCE events for now (~80 MCE records at most).
> > + * MCE events are rare, so a fixed size memory pool should be enough.
> > + * Allocate on a sliding scale based on number of CPUs.
> >   */
> > -#define MCE_POOLSZ	(2 * PAGE_SIZE)
> > +#define MCE_MIN_ENTRIES	80
> >  
> >  static struct gen_pool *mce_evt_pool;
> >  static LLIST_HEAD(mce_event_llist);
> > -static char gen_pool_buf[MCE_POOLSZ];
> >  
> >  /*
> >   * Compare the record "t" with each of the records on list "l" to see if
> > @@ -118,14 +117,25 @@ int mce_gen_pool_add(struct mce *mce)
> >  
> >  static int mce_gen_pool_create(void)
> >  {
> > +	int mce_numrecords, mce_poolsz;
> 
> Should order be also declared in this line? That way we can have all the
> uninitialized 'int's together.

Sure. Even with the addition of "order" the line is still short enough.

> >  	struct gen_pool *tmpp;
> >  	int ret = -ENOMEM;
> > +	void *mce_pool;
> > +	int order;
> >  
> > -	tmpp = gen_pool_create(ilog2(sizeof(struct mce_evt_llist)), -1);
> > +	order = ilog2(sizeof(struct mce_evt_llist)) + 1;
> 
> I didn't exactly understand why a +1 is needed here. Do you have a
> pointer to somewhere to help understand this?
> 
> Also, I think, a comment on top might be useful since this isn't obvious.

gen_pool works in power-of-two blocks. The user must pick a specific
size with the gen_pool_create() call. Internally the gen_pool code uses
a bitmap to track which blocks in the pool are allocated/free.

Looking at this specific case, sizeof(struct mce_evt_llist) is 136. So
the original version of this code picks order 7 to allocate in 128 byte
units. But this means that every allocation of a mce_evt_llist will take
two 128-byte blocks.

Net result is that the comment at the top of arch/x86/kernel/cpu/mce/genpool.c
that two pages are enough for ~80 records was wrong when written. At
that point struct mce_evt_llist was below 128, so order was 6, and each
allocation took two blocks. So two pages = 8192 bytes divided by (2 * 64)
results in 64 possible allocations.

But over time Intel and AMD added to the structure. So the current math
comes out at just 32 allocations before the pool is out of space.

Yazen provided the right answer for this. Change to use order_base_2()

> > +	tmpp = gen_pool_create(order, -1);
> >  	if (!tmpp)
> >  		goto out;
> >  
> > -	ret = gen_pool_add(tmpp, (unsigned long)gen_pool_buf, MCE_POOLSZ, -1);
> > +	mce_numrecords = max(80, num_possible_cpus() * 4);
> 
> How about using MCE_MIN_ENTRIES here?

Oops. I meant to do that when I added the #define!

I've also added a "#define MCE_PER_CPU 4" instead of
a raw "4" in that expression.

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ