linux-kernel - Re: [syzbot] [gfs2?] kernel panic: hung

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <c4360163-3595-e152-765d-641f9c79e8fd@redhat.com>
Date:   Fri, 28 Jul 2023 06:48:47 -0500
From:   Bob Peterson <rpeterso@...hat.com>
To:     David Howells <dhowells@...hat.com>,
        syzbot <syzbot+607aa822c60b2e75b269@...kaller.appspotmail.com>
Cc:     agruenba@...hat.com, arnd@...db.de, cluster-devel@...hat.com,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        syzkaller-bugs@...glegroups.com, viro@...iv.linux.org.uk
Subject: Re: [syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)

On 7/28/23 3:20 AM, David Howells wrote:
> syzbot <syzbot+607aa822c60b2e75b269@...kaller.appspotmail.com> wrote:
> 
>> Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]")
> 
> This would seem unlikely to be the culprit.  It just changes the numbering on
> the fsconfig-related syscalls.
> 
> Running the test program on v6.5-rc3, however, I end up with the test process
> stuck in the D state:
> 
> INFO: task repro-17687f1aa:5551 blocked for more than 120 seconds.
>        Not tainted 6.5.0-rc3-build3+ #1448
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:repro-17687f1aa state:D stack:0     pid:5551  ppid:5516   flags:0x00004002
> Call Trace:
>   <TASK>
>   __schedule+0x4a7/0x4f1
>   schedule+0x66/0xa1
>   schedule_timeout+0x9d/0xd7
>   ? __next_timer_interrupt+0xf6/0xf6
>   gfs2_gl_hash_clear+0xa0/0xdc
>   ? sugov_irq_work+0x15/0x15
>   gfs2_put_super+0x19f/0x1d3
>   generic_shutdown_super+0x78/0x187
>   kill_block_super+0x1c/0x32
>   deactivate_locked_super+0x2f/0x61
>   cleanup_mnt+0xab/0xcc
>   task_work_run+0x6b/0x80
>   exit_to_user_mode_prepare+0x76/0xfd
>   syscall_exit_to_user_mode+0x14/0x31
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f89aac31dab
> RSP: 002b:00007fff43d9b878 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 00007fff43d9cad8 RCX: 00007f89aac31dab
> RDX: 0000000000000000 RSI: 000000000000000a RDI: 00007fff43d9b920
> RBP: 00007fff43d9c960 R08: 0000000000000000 R09: 0000000000000073
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 00007fff43d9cae8 R14: 0000000000417e18 R15: 00007f89aad51000
>   </TASK>
> 
> David
> 
Hi David,

This indicates gfs2 is having trouble resolving and freeing all its 
glocks, which usually means a reference counting problem or ail (active 
items list) problem during unmount.

If gfs2_gl_hash_clear gets stuck for a long period of time it is 
supposed to dump the remaining list of glocks that still have not been 
resolved. I think it takes 10 minutes or so. Can you post the console 
messages that follow? That will help us figure out what's happening. Thanks.

Regards,

Bob Peterson