linux-kernel - Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <91ac638d-b2d6-4683-ab29-fb647f58af63@redhat.com>
Date: Wed, 26 Mar 2025 13:46:31 +0100
From: David Hildenbrand <david@...hat.com>
To: Jinjiang Tu <tujinjiang@...wei.com>, yangge1116@....com,
 akpm@...ux-foundation.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org,
 21cnbao@...il.com, baolin.wang@...ux.alibaba.com,
 aneesh.kumar@...ux.ibm.com, liuzixing@...on.cn,
 Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to
 LRU batch

On 26.03.25 13:42, Jinjiang Tu wrote:
> Hi,
> 

Hi!

> We notiched a 12.3% performance regression for LibMicro pwrite testcase due to
> commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch").
> 
> The testcase is executed as follows, and the file is tmpfs file.
>      pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE

Do we know how much that reflects real workloads? (IOW, how much should 
we care)

> 
> this testcase writes 1KB (only one page) to the tmpfs and repeats this step for many times. The Flame
> graph shows the performance regression comes from folio_mark_accessed() and workingset_activation().
> 
> folio_mark_accessed() is called for the same page for many times. Before this patch, each call will
> add the page to cpu_fbatches.activate. When the fbatch is full, the fbatch is drained and the page
> is promoted to active list. And then, folio_mark_accessed() does nothing in later calls.
> 
> But after this patch, the folio clear lru flags after it is added to cpu_fbatches.activate. After then,
> folio_mark_accessed will never call folio_activate() again due to the page is without lru flag, and
> the fbatch will not be full and the folio will not be marked active, later folio_mark_accessed()
> calls will always call workingset_activation(), leading to performance regression.

Would there be a good place to drain the LRU to effectively get that 
processed? (we can always try draining if the LRU flag is not set)


-- 
Cheers,

David / dhildenb