linux-kernel - Re: [RFC] mm/vmscan: add periodic slab shrinker

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <CE7BB198-BF09-4D9F-AE99-85324B81E472@linux.dev>
Date:   Sat, 2 Apr 2022 10:54:36 -0700
From:   Roman Gushchin <roman.gushchin@...ux.dev>
To:     Hillf Danton <hdanton@...a.com>
Cc:     MM <linux-mm@...ck.org>, Matthew Wilcox <willy@...radead.org>,
        Dave Chinner <david@...morbit.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Stephen Brennan <stephen.s.brennan@...cle.com>,
        Yu Zhao <yuzhao@...gle.com>,
        David Hildenbrand <david@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker

Hello Hillf!

Thank you for sharing it, really interesting! I’m actually working on the same problem. 

No code to share yet, but here are some of my thoughts:
1) If there is a “natural” memory pressure, no additional slab scanning is needed.
2) From a power perspective it’s better to scan more at once, but less often.
3) Maybe we need a feedback loop with the slab allocator: e.g. if slabs are almost full there is more sense to do a proactive scanning and free up some memory, otherwise we’ll end up allocating more slabs. But it’s tricky.
4) If the scanning is not resulting in any memory reclaim, maybe we should (temporarily) exclude the corresponding shrinker from the scanning.

Thanks!

> On Apr 2, 2022, at 12:21 AM, Hillf Danton <hdanton@...a.com> wrote:
> 
> To mitigate the pain of having "several millions" of negative dentries in
> a single directory [1] for example, add the periodic slab shrinker that
> runs independent of direct and background reclaimers in bid to recycle the
> slab objects that haven been cold for more than 30 seconds.
> 
> Q, Why is it needed?
> A, Kswapd may take a nap as long as 30 minutes.
> 
> Add periodic flag to shrink control to let cache owners know this is the
> periodic shrinker that equals to the regular one running at the lowest
> recalim priority, and feel free to take no action without one-off objects
> piling up.
> 
> Only for thoughts now.
> 
> Hillf
> 
> [1] https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.s.brennan@oracle.com/
> 
> --- x/include/linux/shrinker.h
> +++ y/include/linux/shrinker.h
> @@ -14,6 +14,7 @@ struct shrink_control {
> 
>    /* current node being shrunk (for NUMA aware shrinkers) */
>    int nid;
> +    int periodic;
> 
>    /*
>     * How many objects scan_objects should scan and try to reclaim.
> --- x/mm/vmscan.c
> +++ y/mm/vmscan.c
> @@ -781,6 +781,8 @@ static unsigned long do_shrink_slab(stru
>        scanned += shrinkctl->nr_scanned;
> 
>        cond_resched();
> +        if (shrinkctl->periodic)
> +            break;
>    }
> 
>    /*
> @@ -906,7 +908,8 @@ static unsigned long shrink_slab_memcg(g
>  */
> static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>                 struct mem_cgroup *memcg,
> -                 int priority)
> +                 int priority,
> +                 int periodic)
> {
>    unsigned long ret, freed = 0;
>    struct shrinker *shrinker;
> @@ -929,6 +932,7 @@ static unsigned long shrink_slab(gfp_t g
>            .gfp_mask = gfp_mask,
>            .nid = nid,
>            .memcg = memcg,
> +            .periodic = periodic,
>        };
> 
>        ret = do_shrink_slab(&sc, shrinker, priority);
> @@ -952,7 +956,7 @@ out:
>    return freed;
> }
> 
> -static void drop_slab_node(int nid)
> +static void drop_slab_node(int nid, int periodic)
> {
>    unsigned long freed;
>    int shift = 0;
> @@ -966,19 +970,31 @@ static void drop_slab_node(int nid)
>        freed = 0;
>        memcg = mem_cgroup_iter(NULL, NULL, NULL);
>        do {
> -            freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);
> +            freed += shrink_slab(GFP_KERNEL, nid, memcg, 0, periodic);
>        } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
>    } while ((freed >> shift++) > 1);
> }
> 
> -void drop_slab(void)
> +static void __drop_slab(int periodic)
> {
>    int nid;
> 
>    for_each_online_node(nid)
> -        drop_slab_node(nid);
> +        drop_slab_node(nid, periodic);
> +}
> +
> +void drop_slab(void)
> +{
> +    __drop_slab(0);
> }
> 
> +static void periodic_slab_shrinker_workfn(struct work_struct *work)
> +{
> +    __drop_slab(1);
> +    queue_delayed_work(system_unbound_wq, to_delayed_work(work), 30*HZ);
> +}
> +static DECLARE_DELAYED_WORK(periodic_slab_shrinker, periodic_slab_shrinker_workfn);
> +
> static inline int is_page_cache_freeable(struct folio *folio)
> {
>    /*
> @@ -3098,7 +3114,7 @@ static void shrink_node_memcgs(pg_data_t
>        shrink_lruvec(lruvec, sc);
> 
>        shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
> -                sc->priority);
> +                sc->priority, 0);
> 
>        /* Record the group's reclaim efficiency */
>        vmpressure(sc->gfp_mask, memcg, false,
> @@ -4354,8 +4370,11 @@ static void kswapd_try_to_sleep(pg_data_
>         */
>        set_pgdat_percpu_threshold(pgdat, calculate_normal_threshold);
> 
> -        if (!kthread_should_stop())
> +        if (!kthread_should_stop()) {
> +            queue_delayed_work(system_unbound_wq,
> +                        &periodic_slab_shrinker, 60*HZ);
>            schedule();
> +        }
> 
>        set_pgdat_percpu_threshold(pgdat, calculate_pressure_threshold);
>    } else {
> --
>