linux-kernel - Re: [PATCH] mm: memcontrol: prevent starvation when writing memory.high

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALvZod4qKrwvT7MAKrhemjrBfAAsk=fKa9g8QRij42j0CaF4nw@mail.gmail.com>
Date:   Tue, 12 Jan 2021 12:28:04 -0800
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>, Roman Gushchin <guro@...com>,
        Michal Hocko <mhocko@...e.com>, Linux MM <linux-mm@...ck.org>,
        Cgroups <cgroups@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH] mm: memcontrol: prevent starvation when writing memory.high

On Tue, Jan 12, 2021 at 11:55 AM Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Tue, Jan 12, 2021 at 10:59:58AM -0800, Shakeel Butt wrote:
> > On Tue, Jan 12, 2021 at 9:12 AM Johannes Weiner <hannes@...xchg.org> wrote:
> > >
> > > When a value is written to a cgroup's memory.high control file, the
> > > write() context first tries to reclaim the cgroup to size before
> > > putting the limit in place for the workload. Concurrent charges from
> > > the workload can keep such a write() looping in reclaim indefinitely.
> > >
> >
> > Is this observed on real workload?
>
> Yes.
>
> On several production hosts running a particularly aggressive
> workload, we've observed writers to memory.high getting stuck for
> minutes while consuming significant amount of CPU.
>

Good to add this in the commit message or at least mentioning that it
happened in production.

> > Any particular reason to remove !reclaimed?
>
> It's purpose so far was to allow successful reclaim to continue
> indefinitely, while restricting no-progress loops to 'nr_retries'.
>
> Without the first part, it doesn't really matter whether reclaim is
> making progress or not: we do a maximum of 'nr_retries' loops until
> the cgroup size meets the new limit, then exit one way or another.

Does it make sense to add this in the commit message as well? I am
fine with either way.

For the patch:
Reviewed-by: Shakeel Butt <shakeelb@...gle.com>