[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c4360163-3595-e152-765d-641f9c79e8fd@redhat.com>
Date: Fri, 28 Jul 2023 06:48:47 -0500
From: Bob Peterson <rpeterso@...hat.com>
To: David Howells <dhowells@...hat.com>,
syzbot <syzbot+607aa822c60b2e75b269@...kaller.appspotmail.com>
Cc: agruenba@...hat.com, arnd@...db.de, cluster-devel@...hat.com,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
syzkaller-bugs@...glegroups.com, viro@...iv.linux.org.uk
Subject: Re: [syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)
On 7/28/23 3:20 AM, David Howells wrote:
> syzbot <syzbot+607aa822c60b2e75b269@...kaller.appspotmail.com> wrote:
>
>> Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]")
>
> This would seem unlikely to be the culprit. It just changes the numbering on
> the fsconfig-related syscalls.
>
> Running the test program on v6.5-rc3, however, I end up with the test process
> stuck in the D state:
>
> INFO: task repro-17687f1aa:5551 blocked for more than 120 seconds.
> Not tainted 6.5.0-rc3-build3+ #1448
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:repro-17687f1aa state:D stack:0 pid:5551 ppid:5516 flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x4a7/0x4f1
> schedule+0x66/0xa1
> schedule_timeout+0x9d/0xd7
> ? __next_timer_interrupt+0xf6/0xf6
> gfs2_gl_hash_clear+0xa0/0xdc
> ? sugov_irq_work+0x15/0x15
> gfs2_put_super+0x19f/0x1d3
> generic_shutdown_super+0x78/0x187
> kill_block_super+0x1c/0x32
> deactivate_locked_super+0x2f/0x61
> cleanup_mnt+0xab/0xcc
> task_work_run+0x6b/0x80
> exit_to_user_mode_prepare+0x76/0xfd
> syscall_exit_to_user_mode+0x14/0x31
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f89aac31dab
> RSP: 002b:00007fff43d9b878 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 00007fff43d9cad8 RCX: 00007f89aac31dab
> RDX: 0000000000000000 RSI: 000000000000000a RDI: 00007fff43d9b920
> RBP: 00007fff43d9c960 R08: 0000000000000000 R09: 0000000000000073
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 00007fff43d9cae8 R14: 0000000000417e18 R15: 00007f89aad51000
> </TASK>
>
> David
>
Hi David,
This indicates gfs2 is having trouble resolving and freeing all its
glocks, which usually means a reference counting problem or ail (active
items list) problem during unmount.
If gfs2_gl_hash_clear gets stuck for a long period of time it is
supposed to dump the remaining list of glocks that still have not been
resolved. I think it takes 10 minutes or so. Can you post the console
messages that follow? That will help us figure out what's happening. Thanks.
Regards,
Bob Peterson
Powered by blists - more mailing lists