lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpE854v=3k4+cK34M5vfg-S25OqSideMSofLOFe4d177Vw@mail.gmail.com>
Date:   Mon, 11 Dec 2017 13:05:50 -0800
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>, hillf.zj@...baba-inc.com,
        minchan@...nel.org, mgorman@...hsingularity.net,
        ying.huang@...el.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Tim Murray <timmurray@...gle.com>,
        Todd Kjos <tkjos@...gle.com>
Subject: Re: [PATCH v2] mm: terminate shrink_slab loop if signal is pending

On Sat, Dec 9, 2017 at 12:08 AM, Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
> Suren Baghdasaryan wrote:
>> On Fri, Dec 8, 2017 at 6:03 AM, Tetsuo Handa
>> <penguin-kernel@...ove.sakura.ne.jp> wrote:
>> >> > >> This change checks for pending
>> >> > >> fatal signals inside shrink_slab loop and if one is detected
>> >> > >> terminates this loop early.
>> >> > >
>> >> > > This changelog doesn't really address my previous review feedback, I am
>> >> > > afraid. You should mention more details about problems you are seeing
>> >> > > and what causes them.
>>
>> The problem I'm facing is that a SIGKILL sent from user space to kill
>> the least important process is delayed enough for OOM-killer to get a
>> chance to kill something else, possibly a more important process. Here
>> "important" is from user's point of view. So the delay in SIGKILL
>> delivery effectively causes extra kills. Traces indicate that this
>> delay happens when process being killed is in direct reclaim and
>> shrinkers (before I fixed them) were the biggest cause for the delay.
>
> Sending SIGKILL from userspace is not releasing memory fast enough to prevent
> the OOM killer from invoking? Yes, under memory pressure, even an attempt to
> send SIGKILL from userspace could be delayed due to e.g. page fault.
>

I understand that there will be cases when OOM is unavoidable. I'm
trying to minimize the cases when SIGKILL processing is delayed.

> Unless it is memcg OOM, you could try OOM notifier callback for checking
> whether there are SIGKILL pending processes and wait for timeout if any.
> This situation resembles timeout-based OOM killing discussion, where the OOM
> killer is enabled again (based on an assumption that the OOM victim got stuck
> due to OOM lockup) after some timeout from previous OOM-killing. And since
> we did not merge timeout-based approach, there is no timestamp field for
> remembering when the SIGKILL was delivered. Therefore, an attempt to wait for
> timeout would become messy.
>
> Regardless of whether you try OOM notifier callback, I think that adding
> __GFP_KILLABLE and allow bailing out without calling out_of_memory() and
> warn_alloc() will help terminating killed processes faster. I think that
> majority of memory allocation requests can give up upon SIGKILL. Important
> allocation requests which should not give up upon SIGKILL (e.g. committing
> to filesystem metadata and storage) can be offloaded to kernel threads.
>
>>
>> >> > > If we have a shrinker which takes considerable
>> >> > > amount of time them we should be addressing that. If that is not
>> >> > > possible then it should be documented at least.
>>
>> I already submitted patches for couple shrinkers. Problem became less
>> pronounced and less frequent but the retry loop Tetsuo mentioned still
>> visibly delays the delivery. The worst case I've seen after fixing
>> shrinkers is 43ms.
>
> You meant "delays the termination (of the killed process)" rather than
> "delays the delivery (of SIGKILL)", didn't you?
>

To be more precise, I'm talking about the delay between send_signal()
and get_signal() or in other words the delay between the moment when
we send the SIGKILL and the moment we handle it.

>> > I agree that making waits/loops killable is generally good. But be sure to be
>> > prepared for the worst case. For example, start __GFP_KILLABLE from "best effort"
>> > basis (i.e. no guarantee that the allocating thread will leave the page allocator
>> > slowpath immediately) and check for fatal_signal_pending() only if
>> > __GFP_KILLABLE is set. That is,
>> >
>> > +               /*
>> > +                * We are about to die and free our memory.
>> > +                * Stop shrinking which might delay signal handling.
>> > +                */
>> > +               if (unlikely((gfp_mask & __GFP_KILLABLE) && fatal_signal_pending(current)))
>> > +                       break;
>> >
>> > at shrink_slab() etc. and
>> >
>> > +               if ((gfp_mask & __GFP_KILLABLE) && fatal_signal_pending(current))
>> > +                       goto nopage;
>> >
>> > at __alloc_pages_slowpath().
>>
>> I was thinking about something similar and will experiment to see if
>> this solves the problem and if it has any side effects. Anyone sees
>> any obvious problems with this approach?
>>
>
> It is Michal who thinks that "killability for a particular allocation request sounds
> like a hack" ( http://lkml.kernel.org/r/201606112335.HBG09891.OLFJOFtVMOQHSF@I-love.SAKURA.ne.jp ).
> I'm willing to give up memory allocations from functions which are called by
> system calls if SIGKILL is pending. Thus, it should be time to try __GFP_KILLABLE.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ