linux-kernel - Re: [PATCH] mm, memcg: reclaim more aggressively before high allocator throttling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200529073118.GE4406@dhcp22.suse.cz>
Date:   Fri, 29 May 2020 09:31:18 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Chris Down <chris@...isdown.name>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>, linux-mm@...ck.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-team@...com
Subject: Re: [PATCH] mm, memcg: reclaim more aggressively before high
 allocator throttling

On Thu 28-05-20 17:48:48, Chris Down wrote:
> Michal Hocko writes:
> > > We send a simple bug fix: bring this instance of reclaim in line with
> > > how everybody else is using the reclaim API, to meet the semantics as
> > > they are intendend and documented.
> > 
> > Here is where we are not on the same page though. Once you have identified
> > that the main problem is that the reclaim fails too early to meet the
> > target then the fix would be to enforce that target. I have asked why
> > this hasn't been done and haven't got any real answer for that. Instead
> > what you call "a simple bug fix" has larger consequences which are not
> > really explained in the changelog and they are also not really trivial
> > to see. If the changelog explicitly stated that the proportional memory
> > reclaim is not sufficient because XYZ and the implementation has been
> > changed to instead meet the high limit target then this would be a
> > completely different story and I believe we could have saved some
> > discussion.
> 
> I agree that the changelog can be made more clear. Any objection if I send
> v2 with changelog changes to that effect, then? :-)

Yes, please. And I would highly appreciate to have the above addressed.
So that we do not have to really scratch heads why a particular design
decision has been made and argue what was the thinking behind.

> > > And somehow this is controversial, and we're just changing around user
> > > promises as we see fit for our particular usecase?
> > > 
> > > I don't even understand how the supposed alternate semantics you read
> > > between the lines in the documentation would make for a useful
> > > feature: It may fail to contain a group of offending tasks to the
> > > configured limit, but it will be fair to those tasks while doing so?
> > > 
> > > > But if your really want to push this through then let's do it
> > > > properly at least. memcg->memcg_nr_pages_over_high has only very
> > > > vague meaning if the reclaim target is the high limit.
> > > 
> > > task->memcg_nr_pages_over_high is not vague, it's a best-effort
> > > mechanism to distribute fairness. It's the current task's share of the
> > > cgroup's overage, and it allows us in the majority of situations to
> > > distribute reclaim work and sleeps in proportion to how much the task
> > > is actually at fault.
> > 
> > Agreed. But this stops being the case as soon as the reclaim target has
> > been reached and new reclaim attempts are enforced because the memcg is
> > still above the high limit. Because then you have a completely different
> > reclaim target - get down to the limit. This would be especially visible
> > with a large memcg_nr_pages_over_high which could even lead to an over
> > reclaim.
> 
> We actually over reclaim even before this patch -- this patch doesn't bring
> much new in that regard.
> 
> Tracing try_to_free_pages for a cgroup at the memory.high threshold shows
> that before this change, we sometimes even reclaim on the order of twice the
> number of pages requested. For example, I see cases where we requested 1000
> pages to be reclaimed, but end up reclaiming 2000 in a single reclaim
> attempt.

This is interesting and worth looking into. I am aware that we can
reclaim potentially much more pages during the icache reclaim and that
there was a heated discussion without any fix merged in the end IIRC.
Do you have any details?

-- 
Michal Hocko
SUSE Labs