lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230803134106.GA130558@ziqianlu-dell>
Date:   Thu, 3 Aug 2023 21:41:06 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Bagas Sanjaya <bagasdotme@...il.com>,
        Colin Ian King <colin.i.king@...il.com>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when
 exercising stress-ng

On Thu, Aug 03, 2023 at 02:06:46PM +0800, Aaron Lu wrote:
> On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote:
> > Hi,
> > 
> > I notice a bug report on Bugzilla [1]. Quoting from it:
> >
> > > How to reproduce:
> > > 
> > > Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.
> > > 
> > > using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
> > > (see https://github.com/ColinIanKing/stress-ng )
> > > 
> > > build and run stress-ng for say 60 minutes:
> > > 
> > > ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m
> > > 
> > > Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0
> > > 
> > > See attached file for an image of the console on the hang (I'm trying to get the full stack dump).
> > 
> > See Bugzilla for the full thread and attached console image.
> > 
> > FWIW, I have to forward this bug report to the mailing lists because
> > Thorsten noted that many developers don't take a look on Bugzilla
> > (see the BZ thread).
> 
> Thanks.
> 
> I can reproduce this issue using below cmdline:
> $ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m
> 
> I'll investigate what is happening.

Hi Colin,

Can you try the below diff on top of v6.5-rc4? It works for me here
although I got the warn in a different place in get_swap_pages(): 

                        WARN(!si->highest_bit,
                             "swap_info %d in list but !highest_bit\n",
                             si->type);

I think the warn you got in add_to_avail_list() due to the swap device
is already in the list is similar, see below explanation.

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8e6dde68b389..cb7e93ec1933 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2330,7 +2330,8 @@ static void _enable_swap_info(struct swap_info_struct *p)
 	 * swap_info_struct.
 	 */
 	plist_add(&p->list, &swap_active_head);
-	add_to_avail_list(p);
+	if (p->highest_bit)
+		add_to_avail_list(p);
 }
 
 static void enable_swap_info(struct swap_info_struct *p, int prio,

The finding is, if a swap device failed to be swapoff, then it will be
reinsert_swap_info() -> _enable_swap_info() -> add_to_avail_list(). The
problem is, this swap device may run out of space with its highest_bit
being 0 and shouldn't be added to avail list. In your case, once its
highest_bit becomes non-zero, it will go through add_to_avail_list()
and since it's already in the list, thus the warn.

If it works for you, I'll prepare a patch. Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ