linux-kernel - Re: [PATCH RFC] rcu/tree: Refactor object allocation and try harder for array allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200422153503.GQ17661@paulmck-ThinkPad-P72>
Date:   Wed, 22 Apr 2020 08:35:03 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Joel Fernandes <joel@...lfernandes.org>,
        Uladzislau Rezki <urezki@...il.com>,
        linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        rcu@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH RFC] rcu/tree: Refactor object allocation and try harder
 for array allocation

On Wed, Apr 22, 2020 at 10:57:52AM -0400, Johannes Weiner wrote:
> On Thu, Apr 16, 2020 at 11:01:00AM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 16, 2020 at 09:17:45AM -0400, Joel Fernandes wrote:
> > > On Thu, Apr 16, 2020 at 12:30:07PM +0200, Uladzislau Rezki wrote:
> > > > I have a question about dynamic attaching of the rcu_head. Do you think
> > > > that we should drop it? We have it because of it requires 8 + syzeof(struct rcu_head)
> > > > bytes and is used when we can not allocate 1 page what is much more for array purpose.
> > > > Therefore, dynamic attaching can succeed because of using SLAB and requesting much
> > > > less memory then one page. There will be higher chance of bypassing synchronize_rcu()
> > > > and inlining freeing on a stack.
> > > > 
> > > > I agree that we should not use GFP_* flags instead we could go with GFP_NOWAIT |
> > > > __GFP_NOWARN when head attaching only. Also dropping GFP_ATOMIC to keep
> > > > atomic reserved memory for others.
> > 
> > I must defer to people who understand the GFP flags better than I do.
> > The suggestion of __GFP_RETRY_MAYFAIL for no memory pressure (or maybe
> > when the CPU's reserve is not yet full) and __GFP_NORETRY otherwise came
> > from one of these people.  ;-)
> 
> The exact flags we want here depends somewhat on the rate and size of
> kfree_rcu() bursts we can expect. We may want to start with one set
> and instrument allocation success rates.
> 
> Memory tends to be fully consumed by the filesystem cache, so some
> form of light reclaim is necessary for almost all allocations.
> 
> GFP_NOWAIT won't do any reclaim by itself, but it'll wake kswapd.
> Kswapd maintains a small pool of free pages so that even allocations
> that are allowed to enter reclaim usually don't have to. It would be
> safe for RCU to dip into that.
> 
> However, there are some cons to using it:
> 
> - Depending on kfree_rcu() burst size, this pool could exhaust (it's
> usually about half a percent of memory, but is affected by sysctls),
> and then it would fail NOWAIT allocations until kswapd has caught up.
> 
> - This pool is shared by all GFP_NOWAIT users, and many (most? all?)
> of them cannot actually sleep. Often they would have to drop locks,
> restart list iterations, or suffer some other form of deterioration to
> work around failing allocations.
> 
> Since rcu wouldn't have anything better to do than sleep at this
> juncture, it may as well join the reclaim effort.
> 
> Using __GFP_NORETRY or __GFP_RETRY_MAYFAIL would allow them that
> without exerting too much pressure on the VM.

Thank you for looking this over and for the feedback!

Good point on the sleeping.  My assumption has been that sleeping waiting
for a grace period was highly likely to allow memory to eventually be
freed, and that there is a point of diminishing returns beyond which
adding additional tasks to the reclaim effort does not help much.

Here are some strategies right offhand when sleeping is required:

1.	Always sleep in synchronize_rcu() in order to (with high
	probability) free the memory.  This might mean that the reclaim
	effort goes slower than would be good.

2.	Always sleep in the memory allocator in order to help reclaim
	along.	(This is a strawman version of what I expect your
	proposal really is, but putting it here for completeness, please
	see below.)

3.	Always sleep in the memory allocator in order to help reclaim
	along, but return failure at some point.  Then the caller
	invokes synchronize_rcu().  When to return failure?

	o	After some substantial but limited amount of effort has
		been spent on reclaim.

	o	When it becomes likely that further reclaim effort
		is not going to free up additional memory.

I am guessing that you are thinking in terms of specifying GFP flags to
result in some variant of #3.

Or am I missing a trick here?

							Thanx, Paul