lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8fbd6f22-2086-4cf7-923f-ac95688c8ddf@amd.com>
Date: Mon, 24 Mar 2025 21:55:45 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Mateusz Guzik <mjguzik@...il.com>
CC: Oleg Nesterov <oleg@...hat.com>, syzbot
	<syzbot+62262fdc0e01d99573fc@...kaller.appspotmail.com>,
	<brauner@...nel.org>, <dhowells@...hat.com>, <jack@...e.cz>,
	<jlayton@...nel.org>, <linux-fsdevel@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <netfs@...ts.linux.dev>,
	<swapnil.sapkal@....com>, <syzkaller-bugs@...glegroups.com>,
	<viro@...iv.linux.org.uk>
Subject: Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter

Hello Mateusz,

On 3/24/2025 9:33 PM, Mateusz Guzik wrote:
> On Mon, Mar 24, 2025 at 3:52 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>> So far, with tracing, this is where I'm:
>>
>> o Mainline + Oleg's optimization reverted:
>>
>>       ...
>>       kworker/43:1-1723    [043] .....   115.309065: p9_read_work: Data read wait 55
>>       kworker/43:1-1723    [043] .....   115.309066: p9_read_work: Data read 55
>>       kworker/43:1-1723    [043] .....   115.309067: p9_read_work: Data read wait 7
>>       kworker/43:1-1723    [043] .....   115.309068: p9_read_work: Data read 7
>>              repro-4138    [043] .....   115.309084: netfs_wake_write_collector: Wake collector
>>              repro-4138    [043] .....   115.309085: netfs_wake_write_collector: Queuing collector work
>>              repro-4138    [043] .....   115.309088: netfs_unbuffered_write: netfs_unbuffered_write
>>              repro-4138    [043] .....   115.309088: netfs_end_issue_write: netfs_end_issue_write
>>              repro-4138    [043] .....   115.309089: netfs_end_issue_write: Write collector need poke 0
>>              repro-4138    [043] .....   115.309091: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
>>    kworker/u1030:1-1951    [168] .....   115.309096: netfs_wake_write_collector: Wake collector
>>    kworker/u1030:1-1951    [168] .....   115.309097: netfs_wake_write_collector: Queuing collector work
>>    kworker/u1030:1-1951    [168] .....   115.309102: netfs_write_collection_worker: Write collect clearing and waking up!
>>       ... (syzbot reproducer continues)
>>
>> o Mainline:
>>
>>      kworker/185:1-1767    [185] .....   109.485961: p9_read_work: Data read wait 7
>>      kworker/185:1-1767    [185] .....   109.485962: p9_read_work: Data read 7
>>      kworker/185:1-1767    [185] .....   109.485962: p9_read_work: Data read wait 55
>>      kworker/185:1-1767    [185] .....   109.485963: p9_read_work: Data read 55
>>              repro-4038    [185] .....   114.225717: netfs_wake_write_collector: Wake collector
>>              repro-4038    [185] .....   114.225723: netfs_wake_write_collector: Queuing collector work
>>              repro-4038    [185] .....   114.225727: netfs_unbuffered_write: netfs_unbuffered_write
>>              repro-4038    [185] .....   114.225727: netfs_end_issue_write: netfs_end_issue_write
>>              repro-4038    [185] .....   114.225728: netfs_end_issue_write: Write collector need poke 0
>>              repro-4038    [185] .....   114.225728: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
>>      ... (syzbot reproducer hangs)
>>
>> There is a third "kworker/u1030" component that never gets woken up for
>> reasons currently unknown to me with Oleg's optimization. I'll keep
>> digging.
>>
> 
> Thanks for the update.
> 
> It is unclear to me if you checked, so I'm going to have to ask just
> in case: when there is a hang, is there *anyone* stuck in pipe code
> (and if so, where)?
> 
> You can get the kernel to print stacks for all threads with sysrq:
> echo t > /proc/sysrq-trigger

This dumps an insane amount of stuff on my console. Let me search if there
is any reference to pipe somewhere in there. Meanwhile, for the reproducer
threads themself, they are at:

# ps aux | grep repro
root        4245  0.0  0.0  19040     0 ?        D    16:10   0:00 ./repro
root        4306  0.0  0.0  19172     0 pts/5    S+   16:13   0:00 ./repro
root        4307  0.0  0.0  19040     0 pts/5    D    16:13   0:00 ./repro

# cat /proc/4306/stack
[<0>] do_wait+0xa9/0x100
[<0>] kernel_wait4+0xa9/0x150
[<0>] do_syscall_64+0x6f/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

# cat /proc/4307/stack
[<0>] do_lock_mount+0x37/0x200
[<0>] path_mount+0x876/0xae0
[<0>] __x64_sys_mount+0x103/0x130
[<0>] do_syscall_64+0x6f/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

# cat /proc/4245/stack
[<0>] netfs_unbuffered_write_iter_locked+0x222/0x2b0 [netfs]
[<0>] netfs_unbuffered_write_iter+0x12d/0x1f0 [netfs]
[<0>] vfs_write+0x307/0x420
[<0>] ksys_write+0x66/0xe0
[<0>] do_syscall_64+0x6f/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

For kworkers, I did:

# for i in $(ps aux | grep kworker | tr -s ' ' | cut -d ' ' -f2); do cat /proc/$i/stack | grep pipe; done

and I got no output. Same for the string "wait".

> 

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ