linux-kernel - Re: [RFC 0/4] Introduce unbalance proactive reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e5099669-3d99-4a9d-b56e-15ce4fc3f366@vivo.com>
Date:   Tue, 14 Nov 2023 20:37:07 +0800
From:   Huan Yang <link@...o.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     "Huang, Ying" <ying.huang@...el.com>, Tejun Heo <tj@...nel.org>,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Peter Xu <peterx@...hat.com>,
        "Vishal Moola (Oracle)" <vishal.moola@...il.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Liu Shixin <liushixin2@...wei.com>,
        Hugh Dickins <hughd@...gle.com>, cgroups@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, opensource.kernel@...o.com
Subject: Re: [RFC 0/4] Introduce unbalance proactive reclaim


在 2023/11/14 18:04, Michal Hocko 写道:
> On Mon 13-11-23 09:54:55, Huan Yang wrote:
>> 在 2023/11/10 20:32, Michal Hocko 写道:
>>> On Fri 10-11-23 14:21:17, Huan Yang wrote:
>>> [...]
>>>>> BTW: how do you know the number of pages to be reclaimed proactively in
>>>>> memcg proactive reclaiming based solution?
>>>> One point here is that we are not sure how long the frozen application
>>>> will be opened, it could be 10 minutes, an hour, or even days.  So we
>>>> need to predict and try, gradually reclaim anonymous pages in
>>>> proportion, preferably based on the LRU algorithm.  For example, if
>>>> the application has been frozen for 10 minutes, reclaim 5% of
>>>> anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%.  It is even more
>>>> complicated as it requires adding a mechanism for predicting failure
>>>> penalties.
>>> Why would make your reclaiming decisions based on time rather than the
>>> actual memory demand? I can see how a pro-active reclaim could make a
>>> head room for an unexpected memory pressure but applying more pressure
>>> just because of inactivity sound rather dubious to me TBH. Why cannot
>>> you simply wait for the external memory pressure (e.g. from kswapd) to
>>> deal with that based on the demand?
>> Because the current kswapd and direct memory reclamation are a passive
>> memory reclamation based on the watermark, and in the event of triggering
>> these reclamation scenarios, the smoothness of the phone application cannot
>> be guaranteed.
> OK, so you are worried about latencies on spike memory usage.
>
>> (We often observe that when the above reclamation is triggered, there
>> is a delay in the application startup, usually accompanied by block
>> I/O, and some concurrency issues caused by lock design.)
> Does that mean you do not have enough head room for kswapd to keep with
Yes, but if set high watermark a little high, the power consumption will 
be very high.
We usually observe that kswapd will run frequently.
Even if we have set a low kswapd water level, kswapd CPU usage can still be
high in some extreme scenarios.(For example, when starting a large 
application that
needs to acquire a large amount of memory in a short period of time. 
)However, we will
not discuss it in detail here, the reasons are quite complex, and we 
have not yet sorted
out a complete understanding of them.
> the memory demand? It is really hard to discuss this without some actual
> numbers or more specifics.
>   
>> To ensure the smoothness of application startup, we have a module in
>> Android called LMKD (formerly known as lowmemorykiller). Based on a
>> certain algorithm, LMKD detects if application startup may be delayed
>> and proactively kills inactive applications.  (For example, based on
>> factors such as refault IO and swap usage.)
>>
>> However, this behavior may cause the applications we want to protect
>> to be killed, which will result in users having to wait for them to
>> restart when they are reopened, which may affect the user
>> experience.(For example, if the user wants to reopen the application
>> interface they are working on, or re-enter the order interface they
>> were viewing.)
> This suggests that your LMKD doesn't pick up the right victim to kill.
> And I suspect this is a fundamental problem of those pro-active oom
Yes, but, our current LMKD configuration is already very conservative, which
can cause lag in some scenarios, but we will not analyze the reasons in 
detail here.
> killer solutions.
>
>> Therefore, the above proactive reclamation interface is designed to
>> compress memory types with minimal cost for upper-layer applications
>> based on reasonable strategies, in order to avoid triggering LMKD or
>> memory reclamation as much as possible, even if it is not balanced.
> This would suggest that MADV_PAGEOUT is really what you are looking for.
Yes, I agree, especially to avoid reclaiming shared anonymous pages.

However, I did some shallow research and found that MADV_PAGEOUT does not
reclaim pages with mapcount != 1. Our applications are usually composed 
of multiple
processes, and some anonymous pages are shared among them. When the 
application
is frozen, the memory that is only shared among the processes within the 
application should
be released, but MADV_PAGEOUT seems not to be suitable for this 
scenario?(If I
misunderstood anything, please correct me.)

In addition, I still have doubts that this approach will consume a lot 
of strategy
resources, but it is worth studying.

Thanks.
> If you really aim at compressing a specific type of memory then tweking
> reclaim to achieve that sounds like a shortcut because madvise based
> solution is more involved. But that is not a solid justification for
> adding a new interface.
Yes, but this RFC is just adding an additional configuration option to 
the proactive
reclaim interface. And in the reclaim path, prioritize processing these 
requests
with reclaim tendencies. However, using `unlikely` judgment should not have
much impact.

-- 
Thanks,
Huan Yang