linux-kernel - Re: [PATCH] mm: terminate shrink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJuCfpHKMamMfw2SW0QnJv_bu4CYLgbHuL0nJ2kwPc8D+44K3w@mail.gmail.com>
Date:   Wed, 6 Dec 2017 17:27:19 -0800
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     mhocko@...e.com, Johannes Weiner <hannes@...xchg.org>,
        hillf.zj@...baba-inc.com, minchan@...nel.org,
        mgorman@...hsingularity.net, ying.huang@...el.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Tim Murray <timmurray@...gle.com>, Todd Kjos <tkjos@...gle.com>
Subject: Re: [PATCH] mm: terminate shrink_slab loop if signal is pending

>
> Some quantification of "quite time consuming" and "delay" would be
> interesting, please.
>

Unfortunately that depends on the implementation of the shrinkers
registered in the system including the ones from drivers. I've
captured traces showing delays of up to 100ms where the process with
pending SIGKILL is in direct memory reclaim and signal handling is
delayed because of that. I realize that it's not the fault of
shrink_slab_lmk() that some shrinkers take long time to shrink their
slabs (sometimes because of justifiable reasons and sometimes because
of a bug which has to be fixed) but this can be a safeguard against
such cases.
Couple shrinker examples that I found most time consuming are (most of
that 100ms delay is the result of the first two ones):

https://patchwork.kernel.org/patch/10096641/
The patch fixes dm-bufio shrinker which in certain conditions reclaims
only one buffer per scan making the shrinking process very
inefficient.

https://android.googlesource.com/kernel/msm/+/android-7.1.0_r0.2/drivers/gpu/msm/kgsl_pool.c#420
This example is from a driver where shrinker returns 0 instead of
SHRINK_STOP when it's unable to reclaim anymore. As a result when
total_scan in do_shrink_slab() is large this will cause multiple
scan_objects() calls with no memory being reclaimed. Patch for this
one is under review by the owners.

Shrinker that seems to be justifiably heavy is super_cache_scan()
inside fs/super.c. I have traces where it takes up to 4ms to complete.