linux-kernel - Re: [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <feccf0e4-61c4-3292-bd87-ff59c4fa0ec6@suse.cz>
Date:   Wed, 9 Mar 2022 18:53:23 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Jann Horn <jannh@...gle.com>, Michal Hocko <mhocko@...nel.org>,
        Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Liang Zhang <zhangliang5@...wei.com>, linux-mm@...ck.org
Subject: Re: [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local
 LRU pagevecs

On 1/31/22 17:29, David Hildenbrand wrote:
> For example, if a page just got swapped in via a read fault, the LRU
> pagevecs might still hold a reference to the page. If we trigger a
> write fault on such a page, the additional reference from the LRU
> pagevecs will prohibit reusing the page.
> 
> Let's conditionally drain the local LRU pagevecs when we stumble over a
> !PageLRU() page. We cannot easily drain remote LRU pagevecs and it might
> not be desirable performance-wise. Consequently, this will only avoid
> copying in some cases.
> 
> Add a simple "page_count(page) > 3" check first but keep the
> "page_count(page) > 1 + PageSwapCache(page)" check in place, as
> we want to minimize cases where we remove a page from the swapcache but
> won't be able to reuse it, for example, because another process has it
> mapped R/O, to not affect reclaim.
> 
> We cannot easily handle the following cases and we will always have to
> copy:
> 
> (1) The page is referenced in the LRU pagevecs of other CPUs. We really
>     would have to drain the LRU pagevecs of all CPUs -- most probably
>     copying is much cheaper.
> 
> (2) The page is already PageLRU() but is getting moved between LRU
>     lists, for example, for activation (e.g., mark_page_accessed()),
>     deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to
>     drain mostly unconditionally, which might be bad performance-wise.
>     Most probably this won't happen too often in practice.
> 
> Note that there are other reasons why an anon page might temporarily not
> be PageLRU(): for example, compaction and migration have to isolate LRU
> pages from the LRU lists first (isolate_lru_page()), moving them to
> temporary local lists and clearing PageLRU() and holding an additional
> reference on the page. In that case, we'll always copy.
> 
> This change seems to be fairly effective with the reproducer [1] shared
> by Nadav, as long as writeback is done synchronously, for example, using
> zram. However, with asynchronous writeback, we'll usually fail to free the
> swapcache because the page is still under writeback: something we cannot
> easily optimize for, and maybe it's not really relevant in practice.
> 
> [1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com
> 
> Signed-off-by: David Hildenbrand <david@...hat.com>

Acked-by: Vlastimil Babka <vbabka@...e.cz>