linux-kernel - Re: [RFC v3 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9de4bb4a-6bb7-e13a-0d9a-c1306e1b3e60@linux.alibaba.com>
Date:   Wed, 9 Jan 2019 14:09:20 -0800
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     mhocko@...e.com, shakeelb@...gle.com, akpm@...ux-foundation.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC v3 PATCH 0/5] mm: memcontrol: do memory reclaim when
 offlining

On 1/9/19 1:23 PM, Johannes Weiner wrote:
> On Wed, Jan 09, 2019 at 12:36:11PM -0800, Yang Shi wrote:
>> As I mentioned above, if we know some page caches from some memcgs
>> are referenced one-off and unlikely shared, why just keep them
>> around to increase memory pressure?
> It's just not clear to me that your scenarios are generic enough to
> justify adding two interfaces that we have to maintain forever, and
> that they couldn't be solved with existing mechanisms.
>
> Please explain:
>
> - Unmapped clean page cache isn't expensive to reclaim, certainly
>    cheaper than the IO involved in new application startup. How could
>    recycling clean cache be a prohibitive part of workload warmup?

It is nothing about recycling. Those page caches might be referenced by 
memcg just once, then nobody touch them until memory pressure is hit. 
And, they might be not accessed again at any time soon.

>
> - Why you cannot temporarily raise the kswapd watermarks right before
>    an important application starts up (your answer was sorta handwavy)

It could, but kswapd watermark is global. Boosting kswapd watermark may 
cause kswapd reclaim some memory from some memcgs which we want to keep 
untouched. Although v2's low/min could provide some protection, it is 
still not prohibited generally. And, v1 doesn't have such protection at all.

force_empty or wipe_on_offline could be used to target to some specific 
memcgs which we may know exactly what they do or it is safe to reclaim 
memory from them. IMHO, this may make better isolation.

>
> - Why you cannot use madvise/fadvise when an application whose cache
>    you won't reuse exits

Sure we can. But, we can't guarantee all applications use them properly.

>
> - Why you couldn't set memory.high or memory.max to 0 after the
>    application quits and before you call rmdir on the cgroup

I recall I explained this in the review email for the first version. Set 
memory.high or memory.max to 0 would trigger direct reclaim which may 
stall the offline of memcg. But, we have "restarting the same name job" 
logic in our usecase (I'm not quite sure why they do so). Basically, it 
means to create memcg with the exact same name right after the old one 
is deleted, but may have different limit or other settings. The creation 
has to wait for rmdir is done.

>
> Adding a permanent kernel interface is a serious measure. I think you
> need to make a much better case for it, discuss why other options are
> not practical, and show that this will be a generally useful thing for
> cgroup users and not just a niche fix for very specific situations.

I do understand your concern and the maintenance cost for a permanent 
kernel interface. I'm not quite sure if this is generic enough, however, 
Michal Hocko did mention "It seems we have several people asking for 
something like that already.", so at least it sounds not like "a niche 
fix for very specific situations".

In my first submit, I did reuse force_empty interface to keep it less 
intrusive, at least not a new interface. Since we have several people 
asking for something like that already, Michal suggested a new knob 
instead of reusing force_empty.

Thanks,
Yang