linux-kernel - Re: [PATCH -V2] mm: move idle swap cache pages to the tail of LRU after COW

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <871r9nbxc9.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Mon, 31 May 2021 07:44:54 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Matthew Wilcox <willy@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Xu <peterx@...hat.com>, Hugh Dickins <hughd@...gle.com>,
        Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...riel.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Michal Hocko <mhocko@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Tim Chen <tim.c.chen@...el.com>
Subject: Re: [PATCH -V2] mm: move idle swap cache pages to the tail of LRU
 after COW

Johannes Weiner <hannes@...xchg.org> writes:

> On Thu, May 27, 2021 at 04:49:53PM +0800, Huang Ying wrote:
>> With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after
>> COW, the idle swap cache (neither the page nor the corresponding swap
>> entry is mapped by any process) will be left at the original position
>> in the LRU list.  While it may be in the active list or the head of
>> the inactive list, so that vmscan may take more overhead or time to
>> reclaim these actually unused pages.
>> 
>> To help the page reclaiming, in this patch, after COW, the idle swap
>> cache will be tried to be moved to the tail of the inactive LRU list.
>> To avoid to introduce much overhead to the hot COW code path, all
>> locks are acquired with try locking.
>> 
>> To test the patch, we used pmbench memory accessing benchmark with
>> working-set larger than available memory on a 2-socket Intel server
>> with a NVMe SSD as swap device.  Test results shows that the pmbench
>> score increases up to 21.8% with the decreased size of swap cache and
>> swapin throughput.
>> 
>> Signed-off-by: "Huang, Ying" <ying.huang@...el.com>
>> Suggested-by: Matthew Wilcox <willy@...radead.org>
>> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
>> Cc: Peter Xu <peterx@...hat.com>
>> Cc: Hugh Dickins <hughd@...gle.com>
>> Cc: Johannes Weiner <hannes@...xchg.org>
>> Cc: Mel Gorman <mgorman@...e.de>
>> Cc: Rik van Riel <riel@...riel.com>
>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>> Cc: Michal Hocko <mhocko@...nel.org>
>> Cc: Dave Hansen <dave.hansen@...el.com>
>> Cc: Tim Chen <tim.c.chen@...el.com>
>> 
>> V2:
>> 
>> - Move trylock_page() to try_to_free_idle_swapcache() per Rik and
>>   Linus' comments.
>> - Fix PageLRU() checking.
>> - Fix THP processing.
>> - Rename the function.
>> ---
>>  include/linux/memcontrol.h | 10 ++++++++++
>>  include/linux/swap.h       |  3 +++
>>  mm/memcontrol.c            | 12 ++++++++++++
>>  mm/memory.c                |  2 ++
>>  mm/swapfile.c              | 39 ++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 66 insertions(+)
>
> Sorry the discussion fizzled out on the last patch.
>
> Let me try to recap this series: on your first submission you directly
> freed the old page if we copied. Linus was worried about overhead in
> the COW path that wouldn't pay off in a real workload. Before getting
> numbers, it was then suggested to move the pages to the tail of the
> LRU and leaving them to reclaim - which was also met with skepticism.
>
> V2 presented the LRU moving version with pmbench numbers that indeed
> show it pays off. However, much simpler direct freeing produces even
> better numbers in the same benchmark. We don't have numbers showing if
> the LRU shuffling would significantly fare better in other workloads.
>
> Purely looking at the code: whether we defer or free, we need to lock
> the page, take the LRU spinlock for this page, and touch the LRU
> linkage. If we free, we add the swapcache deletion and the page
> allocator, but it's most likely the percpu-cached fastpath. If we
> defer, reclaim needs to re-establish information about the page that
> we already had in the COW context, do another LRU operation, do the
> swapcache deletion and go through the allocator, but on cold caches.
>
> Personally, I'm a bit skeptical the extra code complexity and reclaim
> overhead in paging workloads will definitely pay off in intermittently
> paging loads (non-paging wouldn't have swap pages). As far as code
> goes, the point of 09854ba94c6a (+17, -42) was simplification, and
> this adds more lines back in another place. In particular it adds
> another lifetime variant to swap pages which are already somewhat
> unwieldy. OTOH, freeing is a two-liner reusing the swap unmap code:
>
> 	if (page_copied)
> 		free_swap_cache(old_page);

Yes.  This looks better than my previous version, which duplicated the
code of free_swap_cache().  Thanks for pointing this out.

Best Regards,
Huang, Ying