linux-kernel - Re: [PATCH v2] mm: terminate shrink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171210101311.GA20234@dhcp22.suse.cz>
Date:   Sun, 10 Dec 2017 11:13:11 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Suren Baghdasaryan <surenb@...gle.com>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>, hillf.zj@...baba-inc.com,
        minchan@...nel.org, mgorman@...hsingularity.net,
        ying.huang@...el.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Tim Murray <timmurray@...gle.com>,
        Todd Kjos <tkjos@...gle.com>
Subject: Re: [PATCH v2] mm: terminate shrink_slab loop if signal is pending

On Fri 08-12-17 10:06:26, Suren Baghdasaryan wrote:
> On Fri, Dec 8, 2017 at 6:03 AM, Tetsuo Handa
> <penguin-kernel@...ove.sakura.ne.jp> wrote:
> > Michal Hocko wrote:
> >> On Fri 08-12-17 20:36:16, Tetsuo Handa wrote:
> >> > On 2017/12/08 17:22, Michal Hocko wrote:
> >> > > On Thu 07-12-17 17:23:05, Suren Baghdasaryan wrote:
> >> > >> Slab shrinkers can be quite time consuming and when signal
> >> > >> is pending they can delay handling of the signal. If fatal
> >> > >> signal is pending there is no point in shrinking that process
> >> > >> since it will be killed anyway.
> >> > >
> >> > > The thing is that we are _not_ shrinking _that_ process. We are
> >> > > shrinking globally shared objects and the fact that the memory pressure
> >> > > is so large that the kswapd doesn't keep pace with it means that we have
> >> > > to throttle all allocation sites by doing this direct reclaim. I agree
> >> > > that expediting killed task is a good thing in general because such a
> >> > > process should free at least some memory.
> 
> Agree, wording here is inaccurate. My original intent was to have a
> safeguard against slow shrinkers but I understand your concern that
> this can mask a real problem in a shrinker. In essence expediting the
> killing is the ultimate goal here but as you mentioned it's not as
> simple as this change.

Moreover it doesn't work if the SIGKILL can be delivered asynchronously
(which is your case AFAICU).  You can be already running the slow
shrinker at that time...
 
[...]
> > I agree that making waits/loops killable is generally good. But be sure to be
> > prepared for the worst case. For example, start __GFP_KILLABLE from "best effort"
> > basis (i.e. no guarantee that the allocating thread will leave the page allocator
> > slowpath immediately) and check for fatal_signal_pending() only if
> > __GFP_KILLABLE is set. That is,
> >
> > +               /*
> > +                * We are about to die and free our memory.
> > +                * Stop shrinking which might delay signal handling.
> > +                */
> > +               if (unlikely((gfp_mask & __GFP_KILLABLE) && fatal_signal_pending(current)))
> > +                       break;
> >
> > at shrink_slab() etc. and
> >
> > +               if ((gfp_mask & __GFP_KILLABLE) && fatal_signal_pending(current))
> > +                       goto nopage;
> >
> > at __alloc_pages_slowpath().
> 
> I was thinking about something similar and will experiment to see if
> this solves the problem and if it has any side effects. Anyone sees
> any obvious problems with this approach?

Tetsuo has been proposing this flag in the past and I've had objections
why this is not a great idea. I do not have any link handy but the core
objection was that the semantic would be too fuzzy. All the allocations
in the same context would have to be killable for this flag to have any
effect. Spreading it all over the kernel is simply not feasible.

-- 
Michal Hocko
SUSE Labs