linux-kernel - Re: [RFC] mm/vmscan.c: avoid possible long latency caused by too_many

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOUHufbVmsvWQ-_PSn8CCanuJqRR6Tmj01s17WvKsc3pRa87xw@mail.gmail.com>
Date:   Thu, 22 Apr 2021 14:30:04 -0600
From:   Yu Zhao <yuzhao@...gle.com>
To:     Tim Chen <tim.c.chen@...ux.intel.com>
Cc:     Xing Zhengjun <zhengjun.xing@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Huang Ying <ying.huang@...el.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Michal Hocko <mhocko@...e.com>, wfg@...l.ustc.edu.cn
Subject: Re: [RFC] mm/vmscan.c: avoid possible long latency caused by too_many_isolated()

On Thu, Apr 22, 2021 at 2:17 PM Tim Chen <tim.c.chen@...ux.intel.com> wrote:
>
>
>
> On 4/22/21 10:13 AM, Yu Zhao wrote:
>
> > @@ -3302,6 +3252,7 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> >  unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >                               gfp_t gfp_mask, nodemask_t *nodemask)
> >  {
> > +     int nr_cpus;
> >       unsigned long nr_reclaimed;
> >       struct scan_control sc = {
> >               .nr_to_reclaim = SWAP_CLUSTER_MAX,
> > @@ -3334,8 +3285,17 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >       set_task_reclaim_state(current, &sc.reclaim_state);
> >       trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
> >
> > +     nr_cpus = current_is_kswapd() ? 0 : num_online_cpus();
> > +     while (nr_cpus && !atomic_add_unless(&pgdat->nr_reclaimers, 1, nr_cpus)) {
> > +             if (schedule_timeout_killable(HZ / 10))
>
> 100 msec seems like a long time to wait.  The original code in shrink_inactive_list
> choose 100 msec sleep because the sleep happens only once in the while loop and 100 msec was
> used to check for stalling.  In this case the loop can go on for a while and the
> #reclaimers can go down below the sooner than 100 msec. Seems like it should be checked
> more often.

You are not looking at the original code -- the original code sleeps
indefinitely. It was changed by commit db73ee0d46 to fix a problem
that doesn't apply to the code above.

HZ/10 is purely arbitrary but that's ok because we assume normally
nobody hits it. If you do often, we need to figure out why and how not
to hit it so often.