linux-kernel - Re: [RFC 0/4] Introduce unbalance proactive reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b4694fbf-92df-4067-878e-6035df46582f@vivo.com>
Date:   Mon, 13 Nov 2023 10:17:57 +0800
From:   Huan Yang <link@...o.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     "Huang, Ying" <ying.huang@...el.com>, Tejun Heo <tj@...nel.org>,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Peter Xu <peterx@...hat.com>,
        "Vishal Moola (Oracle)" <vishal.moola@...il.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Liu Shixin <liushixin2@...wei.com>,
        Hugh Dickins <hughd@...gle.com>, cgroups@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, opensource.kernel@...o.com
Subject: Re: [RFC 0/4] Introduce unbalance proactive reclaim


在 2023/11/10 20:24, Michal Hocko 写道:
> On Fri 10-11-23 11:48:49, Huan Yang wrote:
> [...]
>> Also, When the application enters the foreground, the startup speed
>> may be slower. Also trace show that here are a lot of block I/O.
>> (usually 1000+ IO count and 200+ms IO Time) We usually observe very
>> little block I/O caused by zram refault.(read: 1698.39MB/s, write:
>> 995.109MB/s), usually, it is faster than random disk reads.(read:
>> 48.1907MB/s write: 49.1654MB/s). This test by zram-perf and I change a
>> little to test UFS.
>>
>> Therefore, if the proactive reclamation encounters many file pages,
>> the application may become slow when it is opened.
> OK, this is an interesting information. From the above it seems that
> storage based IO refaults are order of magnitude more expensive than
> swap (zram in this case). That means that the memory reclaim should
> _in general_ prefer anonymous memory reclaim over refaulted page cache,
> right? Or is there any reason why "frozen" applications are any
> different in this case?
Frozen applications mean that the application process is no longer active,
so once its private anonymous page data is swapped out, the anonymous
pages will not be refaulted until the application becomes active again.

On the contrary, page caches are usually shared. Even if the application 
that
first read the file is no longer active, other processes may still read 
the file.
Therefore, it is not reasonable to use the proactive reclamation 
interface to
reclaim page caches without considering memory pressure.

Then, considering the recycling cost of anonymous pages and page cache,
the idea of unbalanced recycling as described above is generated.
>
> Our traditional interface to control the anon vs. file balance has been
> swappiness. It is not the best interface and it has its flaws but
> have you experimented with the global swappiness to express that
> preference? What were your observations? Please note that the behavior
We have tested this part and found that no version of the code has the
priority control over swappiness.

This means that even if we modify swappiness to 0 or 200,
we cannot achieve the goal of unbalanced reclaim if some conditions
are not met during the reclaim process. Under certain conditions,
we may mistakenly reclaim file pages, and since we usually trigger
active reclaim when there is sufficient memory(before LMKD trigger),
this will cause higher block IO.

This RFC code provide some flags with the highest priority to set
reclaim tendencies. Currently, it can only be triggered by the active
reclaim interface.
> might be really different with different kernel versions so I would
> really stress out that testing with the current Linus (or akpm) tree is
> necessary.
OK, thank you for the reminder.
>
> Anyway, the more I think about that the more I am convinced that
> explicit anon/file extension for the memory.reclaim interface is just a
> wrong way to address a more fundamental underlying problem. That is, the
> default reclaim choice over anon vs file preference should consider the
> cost of the refaulting IO. This is more a property of the underlying
> storage than a global characteristic. In other words, say you have
> mutlitple storages, one that is a network based with a high latency and
> other that is a local fast SSD. Reclaiming a page backed by the slower
> storage is going to be more expensive to refault than the one backed by
> the fast storage.  So even page cache pages are not really all the same.
>
> It is quite likely that a IO cost aspect is not really easy to integrate
> into the memory reclaim but it seems to me this is a better way to focus
> on for a better long term solution. Our existing refaulting
> infrastructure should help in that respect. Also MGLRU could fit for
> that purpose better than the traditional LRU based reclaim as the higher
> generations could be used for more more expensive pages.

Yes, your insights are very informative.

However, before our algorithm is perfected, I think it is reasonable to 
provide
different reclaim tendencies for the active reclaim interface. This will 
provide
greater flexibility for the strategy layer.
For example, in the field of mobile phones, we can consider the 
comprehensive
impact of refault IO overhead and LMKD killing when providing different 
reclaim
tendencies for the active reclaim interface.

-- 
Thanks,
Huan Yang