linux-kernel - Re: [RFC 0/4] Introduce unbalance proactive reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f46de374-82a2-467c-8d32-a15b518bff17@vivo.com>
Date:   Fri, 10 Nov 2023 11:48:49 +0800
From:   Huan Yang <link@...o.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     "Huang, Ying" <ying.huang@...el.com>, Tejun Heo <tj@...nel.org>,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Peter Xu <peterx@...hat.com>,
        "Vishal Moola (Oracle)" <vishal.moola@...il.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Liu Shixin <liushixin2@...wei.com>,
        Hugh Dickins <hughd@...gle.com>, cgroups@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, opensource.kernel@...o.com
Subject: Re: [RFC 0/4] Introduce unbalance proactive reclaim


在 2023/11/9 21:46, Michal Hocko 写道:
> On Thu 09-11-23 21:07:29, Huan Yang wrote:
> [...]
>>>> Of course, as you suggested, madvise can also achieve this, but
>>>> implementing it in the agent may be more complex.(In terms of
>>>> achieving the same goal, using memcg to group all the processes of an
>>>> APP and perform proactive reclamation is simpler than using madvise
>>>> and scanning multiple processes of an application using an agent?)
>>> It might be more involved but the primary question is whether it is
>>> usable for the specific use case. Madvise interface is not LRU aware but
>>> you are not really talking about that to be a requirement? So it would
>>> really help if you go deeper into details on how is the interface
>>> actually supposed to be used in your case.
>> In mobile field, we usually configure zram to compress anonymous page.
>> We can approximate to expand memory usage with limited hardware memory
>> by using zram.
>>
>> With proper strategies, an 8GB RAM phone can approximate the usage of a 12GB
>> phone
>> (or more).
>>
>> In our strategy, we group memcg by application. When the agent detects that
>> an
>> application has entered the background, then frozen, and has not been used
>> for a long time,
>> the agent will slowly issue commands to reclaim the anonymous page of that
>> application.
>>
>> With this interface, `echo memory anon > memory.reclaim`
> This doesn't really answer my questions above.
>    
>>> Also make sure to exaplain why you cannot use other existing interfaces.
>>> For example, why you simply don't decrease the limit of the frozen
>>> cgroup and rely on the normal reclaim process to evict the most cold
>> This is a question of reclamation tendency, and simply decreasing the limit
>> of the frozen cgroup cannot achieve this.
> Why?

Can I ask how to limit the reclamation to only anonymous pages using the 
limit?

>>> memory? What are you basing your anon vs. file proportion decision on?
>> When zram is configured and anonymous pages are reclaimed proactively, the
>> refault
>> probability of anonymous pages is low when an application is frozen and not
>> reopened.
>> Also, the cost of refaulting from zram is relatively low.
>>
>> However, file pages usually have shared properties, so even if an
>> application is frozen,
>> other processes may still access the file pages. If a limit is set and the
>> reclamation encounters
>> file pages, it will cause a certain amount of refault I/O, which is costly
>> for mobile devices.
> Two points here (and the reason why I am repeatedly asking for some
> data) 1) are you really seeing shared and actively used page cache pages
When we call the current proactive reclamation interface to actively 
reclaim memory,
the debug program can usually observe that file pages are partially 
reclaimed.

However, when we start other APPs for testing(the current reclaimed APP 
is in the background),
the trace shows that there is a lot of block I/O for the background 
application.
> being reclaimed? 2) Is the refault IO really a problem. What kind of
> storage those phone have that this is more significant than potentially
> GB of compressed anonymous memory which would need CPU to refaulted
Phone typically use UFS.
> back. I mean do you have any actual numbers to show that the default
> reclaim strategy would lead to a less utilized or less performant
> system?
Also, When the application enters the foreground, the startup speed may 
be slower. Also trace
show that here are a lot of block I/O. (usually 1000+ IO count and 
200+ms IO Time)
We usually observe very little block I/O caused by zram refault.(read: 
1698.39MB/s, write: 995.109MB/s),
usually, it is faster than random disk reads.(read: 48.1907MB/s write: 
49.1654MB/s). This test by
zram-perf and I change a little to test UFS.

Therefore, if the proactive reclamation encounters many file pages, the 
application may become
slow when it is opened.

-- 
Thanks,
Huan Yang