linux-kernel - Re: memcg reclaim demotion wrt. isolation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87edt1dwd2.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Thu, 15 Dec 2022 14:17:13 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Yang Shi <shy828301@...il.com>, Wei Xu <weixugc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: memcg reclaim demotion wrt. isolation

Michal Hocko <mhocko@...e.com> writes:

> On Tue 13-12-22 17:14:48, Johannes Weiner wrote:
>> On Tue, Dec 13, 2022 at 04:41:10PM +0100, Michal Hocko wrote:
>> > Hi,
>> > I have just noticed that that pages allocated for demotion targets
>> > includes __GFP_KSWAPD_RECLAIM (through GFP_NOWAIT). This is the case
>> > since the code has been introduced by 26aa2d199d6f ("mm/migrate: demote
>> > pages during reclaim"). I suspect the intention is to trigger the aging
>> > on the fallback node and either drop or further demote oldest pages.
>> > 
>> > This makes sense but I suspect that this wasn't intended also for
>> > memcg triggered reclaim. This would mean that a memory pressure in one
>> > hierarchy could trigger paging out pages of a different hierarchy if the
>> > demotion target is close to full.
>> 
>> This is also true if you don't do demotion. If a cgroup tries to
>> allocate memory on a full node (i.e. mbind()), it may wake kswapd or
>> enter global reclaim directly which may push out the memory of other
>> cgroups, regardless of the respective cgroup limits.
>
> You are right on this. But this is describing a slightly different
> situaton IMO. 
>
>> The demotion allocations don't strike me as any different. They're
>> just allocations on behalf of a cgroup. I would expect them to wake
>> kswapd and reclaim physical memory as needed.
>
> I am not sure this is an expected behavior. Consider the currently
> discussed memory.demote interface when the userspace can trigger
> (almost) arbitrary demotions. This can deplete fallback nodes without
> over-committing the memory overall yet push out demoted memory from
> other workloads. From the user POV it would look like a reclaim while
> the overall memory is far from depleted so it would be considered as
> premature and a warrant a bug report.
>
> The reclaim behavior would make more sense to me if it was constrained
> to the allocating memcg hierarchy so unrelated lruvecs wouldn't be
> disrupted.

When we reclaim/demote some pages from a memcg proactively, what is our
goal?  To free up some memory in this memcg for other memcgs to use?  If
so, it sounds reasonable to keep the pages of other memcgs as many as
possible.

Best Regards,
Huang, Ying