[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <076babae-9fc6-13f5-36a3-95dde0115f77@huawei.com>
Date: Thu, 27 Mar 2025 19:16:56 +0800
From: Jinjiang Tu <tujinjiang@...wei.com>
To: David Hildenbrand <david@...hat.com>, <yangge1116@....com>,
<akpm@...ux-foundation.org>
CC: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
<stable@...r.kernel.org>, <21cnbao@...il.com>,
<baolin.wang@...ux.alibaba.com>, <aneesh.kumar@...ux.ibm.com>,
<liuzixing@...on.cn>, Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to
LRU batch
在 2025/3/26 20:46, David Hildenbrand 写道:
> On 26.03.25 13:42, Jinjiang Tu wrote:
>> Hi,
>>
>
> Hi!
>
>> We notiched a 12.3% performance regression for LibMicro pwrite
>> testcase due to
>> commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before
>> adding to LRU batch").
>>
>> The testcase is executed as follows, and the file is tmpfs file.
>> pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE
>
> Do we know how much that reflects real workloads? (IOW, how much
> should we care)
No, it's hard to say.
>
>>
>> this testcase writes 1KB (only one page) to the tmpfs and repeats
>> this step for many times. The Flame
>> graph shows the performance regression comes from
>> folio_mark_accessed() and workingset_activation().
>>
>> folio_mark_accessed() is called for the same page for many times.
>> Before this patch, each call will
>> add the page to cpu_fbatches.activate. When the fbatch is full, the
>> fbatch is drained and the page
>> is promoted to active list. And then, folio_mark_accessed() does
>> nothing in later calls.
>>
>> But after this patch, the folio clear lru flags after it is added to
>> cpu_fbatches.activate. After then,
>> folio_mark_accessed will never call folio_activate() again due to the
>> page is without lru flag, and
>> the fbatch will not be full and the folio will not be marked active,
>> later folio_mark_accessed()
>> calls will always call workingset_activation(), leading to
>> performance regression.
>
> Would there be a good place to drain the LRU to effectively get that
> processed? (we can always try draining if the LRU flag is not set)
Maybe we could drain the search the cpu_fbatches.activate of the local cpu in __lru_cache_activate_folio()? Drain other fbatches is meaningless .
>
>
Powered by blists - more mailing lists