[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87d361ca-7512-4451-016e-6c2e3cec2bfe@gmail.com>
Date: Thu, 3 Aug 2023 15:36:17 +0100
From: "Colin King (gmail)" <colin.i.king@...il.com>
To: Aaron Lu <aaron.lu@...el.com>, Bagas Sanjaya <bagasdotme@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when
exercising stress-ng
Hi Aaron,
Thanks for the speedy fix. I've tested this for a couple of 10 minute
soak test and can't reproduce the issue with the fix, so it looks good
to me, so please add:
Tested-by: Colin Ian King <colin.i.king@...il.com>
Colin
On 03/08/2023 14:41, Aaron Lu wrote:
> On Thu, Aug 03, 2023 at 02:06:46PM +0800, Aaron Lu wrote:
>> On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote:
>>> Hi,
>>>
>>> I notice a bug report on Bugzilla [1]. Quoting from it:
>>>
>>>> How to reproduce:
>>>>
>>>> Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with.
>>>>
>>>> using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137
>>>> (see https://github.com/ColinIanKing/stress-ng )
>>>>
>>>> build and run stress-ng for say 60 minutes:
>>>>
>>>> ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m
>>>>
>>>> Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0
>>>>
>>>> See attached file for an image of the console on the hang (I'm trying to get the full stack dump).
>>>
>>> See Bugzilla for the full thread and attached console image.
>>>
>>> FWIW, I have to forward this bug report to the mailing lists because
>>> Thorsten noted that many developers don't take a look on Bugzilla
>>> (see the BZ thread).
>>
>> Thanks.
>>
>> I can reproduce this issue using below cmdline:
>> $ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m
>>
>> I'll investigate what is happening.
>
> Hi Colin,
>
> Can you try the below diff on top of v6.5-rc4? It works for me here
> although I got the warn in a different place in get_swap_pages():
>
> WARN(!si->highest_bit,
> "swap_info %d in list but !highest_bit\n",
> si->type);
>
> I think the warn you got in add_to_avail_list() due to the swap device
> is already in the list is similar, see below explanation.
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 8e6dde68b389..cb7e93ec1933 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2330,7 +2330,8 @@ static void _enable_swap_info(struct swap_info_struct *p)
> * swap_info_struct.
> */
> plist_add(&p->list, &swap_active_head);
> - add_to_avail_list(p);
> + if (p->highest_bit)
> + add_to_avail_list(p);
> }
>
> static void enable_swap_info(struct swap_info_struct *p, int prio,
>
> The finding is, if a swap device failed to be swapoff, then it will be
> reinsert_swap_info() -> _enable_swap_info() -> add_to_avail_list(). The
> problem is, this swap device may run out of space with its highest_bit
> being 0 and shouldn't be added to avail list. In your case, once its
> highest_bit becomes non-zero, it will go through add_to_avail_list()
> and since it's already in the list, thus the warn.
>
> If it works for you, I'll prepare a patch. Thanks.
Powered by blists - more mailing lists