linux-kernel - Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <076babae-9fc6-13f5-36a3-95dde0115f77@huawei.com>
Date: Thu, 27 Mar 2025 19:16:56 +0800
From: Jinjiang Tu <tujinjiang@...wei.com>
To: David Hildenbrand <david@...hat.com>, <yangge1116@....com>,
	<akpm@...ux-foundation.org>
CC: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
	<stable@...r.kernel.org>, <21cnbao@...il.com>,
	<baolin.wang@...ux.alibaba.com>, <aneesh.kumar@...ux.ibm.com>,
	<liuzixing@...on.cn>, Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to
 LRU batch


在 2025/3/26 20:46, David Hildenbrand 写道:
> On 26.03.25 13:42, Jinjiang Tu wrote:
>> Hi,
>>
>
> Hi!
>
>> We notiched a 12.3% performance regression for LibMicro pwrite 
>> testcase due to
>> commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before 
>> adding to LRU batch").
>>
>> The testcase is executed as follows, and the file is tmpfs file.
>>      pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE
>
> Do we know how much that reflects real workloads? (IOW, how much 
> should we care)

No, it's hard to say.

>
>>
>> this testcase writes 1KB (only one page) to the tmpfs and repeats 
>> this step for many times. The Flame
>> graph shows the performance regression comes from 
>> folio_mark_accessed() and workingset_activation().
>>
>> folio_mark_accessed() is called for the same page for many times. 
>> Before this patch, each call will
>> add the page to cpu_fbatches.activate. When the fbatch is full, the 
>> fbatch is drained and the page
>> is promoted to active list. And then, folio_mark_accessed() does 
>> nothing in later calls.
>>
>> But after this patch, the folio clear lru flags after it is added to 
>> cpu_fbatches.activate. After then,
>> folio_mark_accessed will never call folio_activate() again due to the 
>> page is without lru flag, and
>> the fbatch will not be full and the folio will not be marked active, 
>> later folio_mark_accessed()
>> calls will always call workingset_activation(), leading to 
>> performance regression.
>
> Would there be a good place to drain the LRU to effectively get that 
> processed? (we can always try draining if the LRU flag is not set)

Maybe we could drain the search the cpu_fbatches.activate of the local cpu in __lru_cache_activate_folio()? Drain other fbatches is meaningless .

>
>