lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zq_0X04WsqgUnz30@google.com>
Date: Sun, 4 Aug 2024 15:36:31 -0600
From: Yu Zhao <yuzhao@...gle.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Barry Song <21cnbao@...il.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH mm-unstable v1 5/5] mm/swap: remove boilerplate

On Sat, Aug 03, 2024 at 11:55:51PM -0700, Hugh Dickins wrote:
> On Wed, 10 Jul 2024, Yu Zhao wrote:
> 
> > Remove boilerplate by using a macro to choose the corresponding lock
> > and handler for each folio_batch in cpu_fbatches.
> > 
> > Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> 
> Andrew, please revert this "remove boilerplate" patch (and of course its
> followup fix) from mm-unstable. From the title I presume it was intended
> to make no functional change, but that's far from so.
> 
> Under tmpfs swapping load, on different runs I see various badnesses:
> "Bad page" in __free_one_page(), Oops in __run_timer_base(),
> WARNING at kernel/workqueue.c:790 in set_work_data(), PageBuddy BUG
> at page-flags.h:1009 from __del_page_from_freelist(), something (I'd
> given up taking better notes by this time) in __queue_work(), others.
> 
> All those were including the fix to Barry's report: without that fix,
> the boot is drowned in warnings scrolling past too fast to be read.
> 
> (All the above were on the HP workstation, swapping to SSD; whereas on
> this ThinkPad, swapping to NVMe, no problem seen at all - I mention the
> swapping medium, but have no idea whether that's a relevant difference.
> In each case, MGLRU compiled in but not enabled. THPs and 64kTHPs active.)
> 
> Sorry, but I've put no effort whatsoever into debugging this: "remove
> boilerplate" didn't seem worth the effort, and my personal preference
> is for readable boilerplate over hiding in a macro.  If you prefer the
> macro, I expect Yu can soon come up with a fix (which I could test here):
> but for now please revert "remove boilerplate", since its issues get in
> the way of further mm testing.

Sorry for getting in your way, Hugh.

Apparently I didn't expect local_lock_t to be zero length, i.e., when
CONFIG_DEBUG_LOCK_ALLOC is not set. So that might explain why you only
had problems with one of the two machines, where it failed to disable
IRQ when rotating clean pages after writeback.

The following should fix it, in case you want to verify the above:

diff --git a/mm/swap.c b/mm/swap.c
index 4bc08352ad87..67a246772811 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -254,7 +254,7 @@ static void __folio_batch_add_and_move(struct folio_batch __percpu *fbatch,
 		folio,										\
 		op,										\
 		on_lru,										\
-		offsetof(struct cpu_fbatches, op) > offsetof(struct cpu_fbatches, lock_irq)	\
+		offsetof(struct cpu_fbatches, op) >= offsetof(struct cpu_fbatches, lock_irq)	\
 	)
 
 static void lru_move_tail(struct lruvec *lruvec, struct folio *folio)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ