[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <0F7CF2A7-F671-4196-B8FD-F35E9556391B@gmail.com>
Date: Thu, 1 Sep 2022 02:18:10 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: Yu Zhao <yuzhao@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andi Kleen <ak@...ux.intel.com>,
Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
Catalin Marinas <catalin.marinas@....com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>,
Mel Gorman <mgorman@...e.de>,
Michael Larabel <Michael@...haellarabel.com>,
Michal Hocko <mhocko@...nel.org>,
Mike Rapoport <rppt@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Tejun Heo <tj@...nel.org>, Vlastimil Babka <vbabka@...e.cz>,
Will Deacon <will@...nel.org>,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>,
Linux MM <linux-mm@...ck.org>, X86 ML <x86@...nel.org>,
page-reclaim@...gle.com, Barry Song <baohua@...nel.org>,
Brian Geffon <bgeffon@...gle.com>,
Jan Alexander Steffens <heftig@...hlinux.org>,
Oleksandr Natalenko <oleksandr@...alenko.name>,
Steven Barrett <steven@...uorix.net>,
Suleiman Souhlal <suleiman@...gle.com>,
Daniel Byrne <djbyrne@....edu>,
Donald Carr <d@...os-reins.com>,
Holger Hoffstätte <holger@...lied-asynchrony.com>,
Konstantin Kharlamov <Hi-Angel@...dex.ru>,
Shuang Zhai <szhai2@...rochester.edu>,
Sofia Trinh <sofia.trinh@....works>,
Vaibhav Jain <vaibhav@...ux.ibm.com>
Subject: Re: [PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap
> On Aug 15, 2022, at 12:13 AM, Yu Zhao <yuzhao@...gle.com> wrote:
>
> Searching the rmap for PTEs mapping each page on an LRU list (to test
> and clear the accessed bit) can be expensive because pages from
> different VMAs (PA space) are not cache friendly to the rmap (VA
> space). For workloads mostly using mapped pages, searching the rmap
> can incur the highest CPU cost in the reclaim path.
Impressive work. Sorry if my feedback is not timely.
Just one minor point for thought, that can be left for a later cleanup.
>
> + for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
> + unsigned long pfn;
> +
> + pfn = get_pte_pfn(pte[i], pvmw->vma, addr);
> + if (pfn == -1)
> + continue;
> +
> + if (!pte_young(pte[i]))
> + continue;
> +
> + folio = get_pfn_folio(pfn, memcg, pgdat);
> + if (!folio)
> + continue;
> +
> + if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i))
> + continue;
> +
You have already checked that the PTE is old (not young), so this check
seems redundant. I do not see a way in which the access-bit can be cleared
since you hold the ptl. IOW, there is no need for the “if" and “continue".
Makes me also wonder whether having a separate ptep_clear_young() can
slightly help, since anyhow the access-bit is more of an estimation,
and having a separate ptep_clear_young() can enable optimizations.
On x86, for instance, if the PTE is dirty, we may be able to clear the
access-bit without an atomic operation, which should be faster.
Powered by blists - more mailing lists