[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180629125218.GX3593@linux.vnet.ibm.com>
Date: Fri, 29 Jun 2018 05:52:18 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm,oom: Bring OOM notifier callbacks to outside of OOM
killer.
On Fri, Jun 29, 2018 at 11:04:19AM +0200, Michal Hocko wrote:
> On Thu 28-06-18 14:31:05, Paul E. McKenney wrote:
> > On Thu, Jun 28, 2018 at 01:39:42PM +0200, Michal Hocko wrote:
> > > On Wed 27-06-18 07:31:25, Paul E. McKenney wrote:
> > > > On Wed, Jun 27, 2018 at 09:22:07AM +0200, Michal Hocko wrote:
> > > > > On Tue 26-06-18 10:03:45, Paul E. McKenney wrote:
> > > > > [...]
> > > > > > 3. Something else?
> > > > >
> > > > > How hard it would be to use a different API than oom notifiers? E.g. a
> > > > > shrinker which just kicks all the pending callbacks if the reclaim
> > > > > priority reaches low values (e.g. 0)?
> > > >
> > > > Beats me. What is a shrinker? ;-)
> > >
> > > This is a generich mechanism to reclaim memory that is not on standard
> > > LRU lists. Lwn.net surely has some nice coverage (e.g.
> > > https://lwn.net/Articles/548092/).
> >
> > "In addition, there is little agreement over what a call to a shrinker
> > really means or how the called subsystem should respond." ;-)
> >
> > Is this set up using register_shrinker() in mm/vmscan.c? I am guessing
>
> Yes, exactly. You are supposed to implement the two methods in struct
> shrink_control
>
> > that the many mentions of shrinker in DRM are irrelevant.
> >
> > If my guess is correct, the API seems a poor fit for RCU. I can
> > produce an approximate number of RCU callbacks for ->count_objects(),
> > but a given callback might free a lot of memory or none at all. Plus,
> > to actually have ->scan_objects() free them before returning, I would
> > need to use something like rcu_barrier(), which might involve longer
> > delays than desired.`
>
> Well, I am not yet sure how good fit this is because I still do not
> understand the underlying problem your notifier is trying to solve. So I
> will get back to this once that is settled.
> >
> > Or am I missing something here?
> >
> > > > More seriously, could you please point me at an exemplary shrinker
> > > > use case so I can see what is involved?
> > >
> > > Well, I am not really sure what is the objective of the oom notifier to
> > > point you to the right direction. IIUC you just want to kick callbacks
> > > to be handled sooner under a heavy memory pressure, right? How is that
> > > achieved? Kick a worker?
> >
> > That is achieved by enqueuing a non-lazy callback on each CPU's callback
> > list, but only for those CPUs having non-empty lists. This causes
> > CPUs with lists containing only lazy callbacks to be more aggressive,
> > in particular, it prevents such CPUs from hanging out idle for seconds
> > at a time while they have callbacks on their lists.
> >
> > The enqueuing happens via an IPI to the CPU in question.
>
> I am afraid this is too low level for my to understand what is going on
> here. What are lazy callbacks and why do they need any specific action
> when we are getting close to OOM? I mean, I do understand that we might
> have many callers of call_rcu and free memory lazily. But there is quite
> a long way before we start the reclaim until we reach the OOM killer path.
> So why don't those callbacks get called during that time period? How are
> their triggered when we are not hitting the OOM path? They surely cannot
> sit there for ever, right? Can we trigger them sooner? Maybe the
> shrinker is not the best fit but we have a retry feedback loop in the page
> allocator, maybe we can kick this processing from there.
The effect of RCU's current OOM code is to speed up callback invocation
by at most a few seconds (assuming no stalled CPUs, in which case
it is not possible to speed up callback invocation).
Given that, I should just remove RCU's OOM code entirely?
Thanx, Paul
Powered by blists - more mailing lists