linux-kernel - Re: [PATCH] mm, memcg: reclaim more aggressively before high allocator throttling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALvZod62+Me-n_54W+GiJrXafwFOzd8nfvkLmOBM190sEGJF=g@mail.gmail.com>
Date:   Thu, 28 May 2020 14:02:24 -0700
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Chris Down <chris@...isdown.name>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>, Michal Hocko <mhocko@...nel.org>,
        Linux MM <linux-mm@...ck.org>,
        Cgroups <cgroups@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH] mm, memcg: reclaim more aggressively before high
 allocator throttling

On Thu, May 28, 2020 at 1:30 PM Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Thu, May 28, 2020 at 08:48:31PM +0100, Chris Down wrote:
> > Shakeel Butt writes:
> > > What was the initial reason to have different behavior in the first place?
> >
> > This differing behaviour is simply a mistake, it was never intended to be
> > this deviate from what happens elsewhere. To that extent this patch is as
> > much a bug fix as it is an improvement.
>
> Yes, it was an oversight.
>
> > > >  static void high_work_func(struct work_struct *work)
> > > > @@ -2378,16 +2384,20 @@ void mem_cgroup_handle_over_high(void)
> > > >  {
> > > >         unsigned long penalty_jiffies;
> > > >         unsigned long pflags;
> > > > +       unsigned long nr_reclaimed;
> > > >         unsigned int nr_pages = current->memcg_nr_pages_over_high;
> > >
> > > Is there any benefit to keep current->memcg_nr_pages_over_high after
> > > this change? Why not just use SWAP_CLUSTER_MAX?
>
> It's there for the same reason why try_to_free_pages() takes a reclaim
> argument in the first place: we want to make the thread allocating the
> most also do the most reclaim work. Consider a thread allocating THPs
> in a loop with another thread allocating regular pages.
>
> Remember that all callers loop. They could theoretically all just ask
> for SWAP_CLUSTER_MAX pages over and over again.
>
> The idea is to have fairness in most cases, and avoid allocation
> failure, premature OOM, and containment failure in the edge cases that
> are caused by the inherent raciness of page reclaim.
>

Thanks for the explanation.

> > I don't feel strongly either way, but current->memcg_nr_pages_over_high can
> > be very large for large allocations.
> >
> > That said, maybe we should just reclaim `max(SWAP_CLUSTER_MAX, current -
> > high)` for each loop? I agree that with this design it looks like perhaps we
> > don't need it any more.
> >
> > Johannes, what do you think?
>
> How about this:
>
> Reclaim memcg_nr_pages_over_high in the first iteration, then switch
> to SWAP_CLUSTER_MAX in the retries.
>
> This acknowledges that while the page allocator and memory.max reclaim
> every time an allocation is made, memory.high is currently batched and
> can have larger targets. We want the allocating thread to reclaim at
> least the batch size, but beyond that only what's necessary to prevent
> premature OOM or failing containment.
>
> Add a comment stating as much.
>
> Once we reclaim memory.high synchronously instead of batched, this
> exceptional handling is no longer needed and can be deleted again.
>
> Does that sound reasonable?

SGTM. It does not seem controversial to me to let the task do the work
to resolve the condition for which it is being throttled.