lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <26870d6f-8bb9-44de-9d1f-dcb1b5a93eae@redhat.com>
Date: Tue, 1 Apr 2025 16:33:20 +0200
From: David Hildenbrand <david@...hat.com>
To: Jinjiang Tu <tujinjiang@...wei.com>, yangge1116@....com,
 akpm@...ux-foundation.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org,
 21cnbao@...il.com, baolin.wang@...ux.alibaba.com,
 aneesh.kumar@...ux.ibm.com, liuzixing@...on.cn,
 Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to
 LRU batch

On 27.03.25 12:16, Jinjiang Tu wrote:
> 
> 在 2025/3/26 20:46, David Hildenbrand 写道:
>> On 26.03.25 13:42, Jinjiang Tu wrote:
>>> Hi,
>>>
>>
>> Hi!
>>
>>> We notiched a 12.3% performance regression for LibMicro pwrite
>>> testcase due to
>>> commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before
>>> adding to LRU batch").
>>>
>>> The testcase is executed as follows, and the file is tmpfs file.
>>>       pwrite -E -C 200 -L -S -W -N "pwrite_t1k" -s 1k -I 500 -f $TFILE
>>
>> Do we know how much that reflects real workloads? (IOW, how much
>> should we care)
> 
> No, it's hard to say.
> 
>>
>>>
>>> this testcase writes 1KB (only one page) to the tmpfs and repeats
>>> this step for many times. The Flame
>>> graph shows the performance regression comes from
>>> folio_mark_accessed() and workingset_activation().
>>>
>>> folio_mark_accessed() is called for the same page for many times.
>>> Before this patch, each call will
>>> add the page to cpu_fbatches.activate. When the fbatch is full, the
>>> fbatch is drained and the page
>>> is promoted to active list. And then, folio_mark_accessed() does
>>> nothing in later calls.
>>>
>>> But after this patch, the folio clear lru flags after it is added to
>>> cpu_fbatches.activate. After then,
>>> folio_mark_accessed will never call folio_activate() again due to the
>>> page is without lru flag, and
>>> the fbatch will not be full and the folio will not be marked active,
>>> later folio_mark_accessed()
>>> calls will always call workingset_activation(), leading to
>>> performance regression.
>>
>> Would there be a good place to drain the LRU to effectively get that
>> processed? (we can always try draining if the LRU flag is not set)
> 
> Maybe we could drain the search the cpu_fbatches.activate of the local cpu in __lru_cache_activate_folio()? Drain other fbatches is meaningless .

So the current behavior is that folio_mark_accessed() will end up calling folio_activate()
once, and then __lru_cache_activate_folio() until the LRU was drained (which can
take a looong time).

The old behavior was that folio_mark_accessed() would keep calling folio_activate() until
the LRU was drained simply because it was full of "all the same pages" ?. Only *after*
the LRU was drained, folio_mark_accessed() would actually not do anything (desired behavior).

So the overhead comes primarily from __lru_cache_activate_folio() searching through
the list "more" repeatedly because the LRU does get drained less frequently; and
it would never find it in there in this case.

So ... it used to be suboptimal before, now it's more suboptimal I guess?! :)

We'd need a way to better identify "this folio is already queued for activation". Searching
that list as well would be one option, but the hole "search the list" is nasty.

Maybe we can simply set the folio as active when staging it for activation, after having
cleared the LRU flag? We already do that during folio_add.

As the LRU flag was cleared, nobody should be messing with that folio until the cache was
drained and the move was successful.


Pretty sure this doesn't work, but just to throw out an idea:

 From c26e1c0ceda6c818826e5b89dc7c7c9279138f80 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@...hat.com>
Date: Tue, 1 Apr 2025 16:31:56 +0200
Subject: [PATCH] test

Signed-off-by: David Hildenbrand <david@...hat.com>
---
  mm/swap.c | 21 ++++++++++++++++-----
  1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index fc8281ef42415..bbf9aa76db87f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -175,6 +175,8 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
  	folios_put(fbatch);
  }
  
+static void lru_activate(struct lruvec *lruvec, struct folio *folio);
+
  static void __folio_batch_add_and_move(struct folio_batch __percpu *fbatch,
  		struct folio *folio, move_fn_t move_fn,
  		bool on_lru, bool disable_irq)
@@ -191,6 +193,10 @@ static void __folio_batch_add_and_move(struct folio_batch __percpu *fbatch,
  	else
  		local_lock(&cpu_fbatches.lock);
  
+	/* We'll only perform the actual list move deferred. */
+	if (move_fn == lru_activate)
+		folio_set_active(folio);
+
  	if (!folio_batch_add(this_cpu_ptr(fbatch), folio) || folio_test_large(folio) ||
  	    lru_cache_disabled())
  		folio_batch_move_lru(this_cpu_ptr(fbatch), move_fn);
@@ -299,12 +305,14 @@ static void lru_activate(struct lruvec *lruvec, struct folio *folio)
  {
  	long nr_pages = folio_nr_pages(folio);
  
-	if (folio_test_active(folio) || folio_test_unevictable(folio))
-		return;
-
+	/*
+	 * We set the folio active after clearing the LRU flag, and set the
+	 * LRU flag only after moving it to the right list.
+	 */
+	VM_WARN_ON_ONCE(!folio_test_active(folio));
+	VM_WARN_ON_ONCE(folio_test_unevictable(folio));
  
  	lruvec_del_folio(lruvec, folio);
-	folio_set_active(folio);
  	lruvec_add_folio(lruvec, folio);
  	trace_mm_lru_activate(folio);
  
@@ -342,7 +350,10 @@ void folio_activate(struct folio *folio)
  		return;
  
  	lruvec = folio_lruvec_lock_irq(folio);
-	lru_activate(lruvec, folio);
+	if (!folio_test_unevictable(folio)) {
+		folio_set_active(folio);
+		lru_activate(lruvec, folio);
+	}
  	unlock_page_lruvec_irq(lruvec);
  	folio_set_lru(folio);
  }
-- 
2.48.1


-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ