linux-kernel - Re: [PATCH] vmscan: retry without cache trim mode if nothing scanned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87v99yvzq8.fsf@yhuang-dev.intel.com>
Date:   Thu, 11 Mar 2021 16:52:47 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Tejun Heo <tj@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...e.de>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Michal Hocko <mhocko@...e.cz>,
        Joonsoo Kim <iamjoonsoo.kim@....com>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH] vmscan: retry without cache trim mode if nothing scanned

Hi, Butt,

Shakeel Butt <shakeelb@...gle.com> writes:

> On Wed, Mar 10, 2021 at 4:47 PM Huang, Ying <ying.huang@...el.com> wrote:
>>
>> From: Huang Ying <ying.huang@...el.com>
>>
>> In shrink_node(), to determine whether to enable cache trim mode, the
>> LRU size is gotten via lruvec_page_state().  That gets the value from
>> a per-CPU counter (mem_cgroup_per_node->lruvec_stat[]).  The error of
>> the per-CPU counter from CPU local counting and the descendant memory
>> cgroups may cause some issues.  We run into this in 0-Day performance
>> test.
>>
>> 0-Day uses the RAM file system as root file system, so the number of
>> the reclaimable file pages is very small.  In the swap testing, the
>> inactive file LRU list will become almost empty soon.  But the size of
>> the inactive file LRU list gotten from the per-CPU counter may keep a
>> much larger value (say, 33, 50, etc.).  This will enable cache trim
>> mode, but nothing can be scanned in fact.  The following pattern
>> repeats for long time in the test,
>>
>> priority        inactive_file_size      cache_trim_mode
>> 12              33                      0
>> 11              33                      0
>> ...
>> 6               33                      0
>> 5               33                      1
>> ...
>> 1               33                      1
>>
>> That is, the cache_trim_mode will be enabled wrongly when the scan
>> priority decreases to 5.  And the problem will not be recovered for
>> long time.
>>
>> It's hard to get the more accurate size of the inactive file list
>> without much more overhead.  And it's hard to estimate the error of
>> the per-CPU counter too, because there may be many descendant memory
>> cgroups.  But after the actual scanning, if nothing can be scanned
>> with the cache trim mode, it should be wrong to enable the cache trim
>> mode.  So we can retry with the cache trim mode disabled.  This patch
>> implement this policy.
>
> Instead of playing with the already complicated heuristics, we should
> improve the accuracy of the lruvec stats. Johannes already fixed the
> memcg stats using rstat infrastructure and Tejun has suggestions on
> how to use rstat infrastructure efficiently for lruvec stats at
> https://lore.kernel.org/linux-mm/YCFgr300eRiEZwpL@slm.duckdns.org/.

Thanks for your information!  It should be better if we can improve the
accuracy of lruvec stats without much overhead.  But that may be not a
easy task.

If my understanding were correct, what Tejun suggested is to add a fast
read interface to rstat to be used in hot path.  And its accuracy is
similar as that of traditional per-CPU counter.  But if we can regularly
update the lruvec rstat with something like vmstat_update(), that should
be OK for the issue described in this patch.

Best Regards,
Huang, Ying