linux-kernel - Re: [RFC] mm/vmscan.c: avoid possible long latency caused by too_many

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YIqDv+dQL73KAqjm@dhcp22.suse.cz>
Date:   Thu, 29 Apr 2021 12:00:31 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Xing Zhengjun <zhengjun.xing@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Huang Ying <ying.huang@...el.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Shakeel Butt <shakeelb@...gle.com>, wfg@...l.ustc.edu.cn,
        Rik van Riel <riel@...riel.com>,
        Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [RFC] mm/vmscan.c: avoid possible long latency caused by
 too_many_isolated()

On Wed 28-04-21 09:05:06, Yu Zhao wrote:
> On Wed, Apr 28, 2021 at 5:55 AM Michal Hocko <mhocko@...e.com> wrote:
[...]
> > > @@ -3334,8 +3285,17 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> > >       set_task_reclaim_state(current, &sc.reclaim_state);
> > >       trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
> > >
> > > +     nr_cpus = current_is_kswapd() ? 0 : num_online_cpus();
> > > +     while (nr_cpus && !atomic_add_unless(&pgdat->nr_reclaimers, 1, nr_cpus)) {
> > > +             if (schedule_timeout_killable(HZ / 10))
> > > +                     return SWAP_CLUSTER_MAX;
> > > +     }
> > > +
> > >       nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
> > >
> > > +     if (nr_cpus)
> > > +             atomic_dec(&pgdat->nr_reclaimers);
> > > +
> > >       trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
> > >       set_task_reclaim_state(current, NULL);
> >
> > This will surely break any memcg direct reclaim.
> 
> Mind elaborating how it will "surely" break any memcg direct reclaim?

I was wrong here. I though this is done in a common path for all direct
reclaimers (likely mixed up try_to_free_pages with do_try_free_pages).
Sorry about the confusion.

Still, I do not think that the above heuristic will work properly.
Different reclaimers have a different reclaim target (e.g. lower zones
and/or numa node mask) and strength (e.g.  GFP_NOFS vs. GFP_KERNEL). A
simple count based throttling would be be prone to different sorts of
priority inversions. 
-- 
Michal Hocko
SUSE Labs