lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZqRmO6LNol6S65dm@snowbird>
Date: Fri, 26 Jul 2024 20:15:07 -0700
From: Dennis Zhou <dennis@...nel.org>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Tejun Heo <tj@...nel.org>, kernel test robot <oliver.sang@...el.com>,
	Suren Baghdasaryan <surenb@...gle.com>, oe-lkp@...ts.linux.dev,
	lkp@...el.com, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Kent Overstreet <kent.overstreet@...ux.dev>,
	Kees Cook <keescook@...omium.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Alex Gaynor <alex.gaynor@...il.com>,
	Alice Ryhl <aliceryhl@...gle.com>,
	Andreas Hindborg <a.hindborg@...sung.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Christoph Lameter <cl@...ux.com>, Gary Guo <gary@...yguo.net>,
	Miguel Ojeda <ojeda@...nel.org>,
	Pasha Tatashin <pasha.tatashin@...een.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Wedson Almeida Filho <wedsonaf@...il.com>, linux-mm@...ck.org,
	lkmm@...ts.linux.dev
Subject: Re: [linus:master] [mm]  24e44cc22a:
 BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc

On Tue, Jul 23, 2024 at 02:14:00PM -0700, Boqun Feng wrote:
> On Mon, Jul 22, 2024 at 10:50:53PM -0700, Dennis Zhou wrote:
> > On Mon, Jul 22, 2024 at 01:53:52PM -0700, Boqun Feng wrote:
> > > On Mon, Jul 22, 2024 at 11:27:48AM -0700, Dennis Zhou wrote:
> > > > Hello,
> > > > 
> > > > On Mon, Jul 22, 2024 at 11:03:00AM -0700, Boqun Feng wrote:
> > > > > On Mon, Jul 22, 2024 at 07:52:22AM -1000, Tejun Heo wrote:
> > > > > > On Mon, Jul 22, 2024 at 10:47:30AM -0700, Boqun Feng wrote:
> > > > > > > This looks like a data race because we read pcpu_nr_empty_pop_pages out
> > > > > > > of the lock for a best effort checking, @Tejun, maybe you could confirm
> > > > > > > on this?
> > > > > > 
> > > > > > That does sound plausible.
> > > > > > 
> > > > > > > -       if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > > +       /*
> > > > > > > +        * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
> > > > > > > +        * occur but this is just a best-effort checking, everything is synced
> > > > > > > +        * in pcpu_balance_work.
> > > > > > > +        */
> > > > > > > +       if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
> > > > > > >                 pcpu_schedule_balance_work();
> > > > > > 
> > > > > > Would it be better to use READ/WRITE_ONCE() for the variable?
> > > > > > 
> > > > > 
> > > > > For READ/WRITE_ONCE(), we will need to replace all write accesses and
> > > > > all out-of-lock read accesses to pcpu_nr_empty_pop_pages, like below.
> > > > > It's better in the sense that it doesn't rely on compiler behaviors on
> > > > > data races, not sure about the performance impact though.
> > > > > 
> > > > 
> > > > I think a better alternative is we can move it up into the lock under
> > > > area_found. The value gets updated as part of pcpu_alloc_area() as the
> > > > code above populates percpu memory that is already allocated.
> > > > 
> > > 
> > > Not sure I followed what exactly you suggested here because I'm not
> > > familiar with the logic, but a simpler version would be:
> > > 
> > > 
> > 
> > I believe that's the only naked access of pcpu_nr_empty_pop_pages. So
> > I was thinking this'll fix this problem.
> > 
> > I also don't know how to rerun this CI tho..
> > 
> > ---
> > diff --git a/mm/percpu.c b/mm/percpu.c
> > index 20d91af8c033..325fb8412e90 100644
> > --- a/mm/percpu.c
> > +++ b/mm/percpu.c
> > @@ -1864,6 +1864,10 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> >  
> >  area_found:
> >  	pcpu_stats_area_alloc(chunk, size);
> > +
> > +	if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > +		pcpu_schedule_balance_work();
> > +
> 
> But the pcpu_chunk_populated() afterwards could modify the
> pcpu_nr_empty_pop_pages again, wouldn't this be a behavior changing?
> 

It does, but really at this point it's a mixed bag because the lock
isn't permanently held at all while we do all these operations. The
value is read at best effort.

Ultimately the code below is populating backing pages for non-atomic
allocations. At this point the ideal situation is we're using an already
populated page. There are caveats but I can't say the prior is any
better than this version.

The code you mentioned pairs with the comment on line 916 below.

	/*
	 * If the allocation is not atomic, some blocks may not be
	 * populated with pages, while we account it here.  The number
	 * of pages will be added back with pcpu_chunk_populated()
	 * when populating pages.
	 */

Thanks,
Dennis

> Regards,
> Boqun
> 
> >  	spin_unlock_irqrestore(&pcpu_lock, flags);
> >  
> >  	/* populate if not all pages are already there */
> > @@ -1891,9 +1895,6 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> >  		mutex_unlock(&pcpu_alloc_mutex);
> >  	}
> >  
> > -	if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
> > -		pcpu_schedule_balance_work();
> > -
> >  	/* clear the areas and return address relative to base address */
> >  	for_each_possible_cpu(cpu)
> >  		memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ