lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <569a9447-aec8-4afa-a709-caac6c6804fa@amd.com>
Date: Mon, 24 Mar 2025 22:06:20 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Mateusz Guzik <mjguzik@...il.com>
CC: Oleg Nesterov <oleg@...hat.com>, syzbot
	<syzbot+62262fdc0e01d99573fc@...kaller.appspotmail.com>,
	<brauner@...nel.org>, <dhowells@...hat.com>, <jack@...e.cz>,
	<jlayton@...nel.org>, <linux-fsdevel@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <netfs@...ts.linux.dev>,
	<swapnil.sapkal@....com>, <syzkaller-bugs@...glegroups.com>,
	<viro@...iv.linux.org.uk>
Subject: Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter



On 3/24/2025 9:55 PM, K Prateek Nayak wrote:
> Hello Mateusz,
> 
> On 3/24/2025 9:33 PM, Mateusz Guzik wrote:
>> On Mon, Mar 24, 2025 at 3:52 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>>> So far, with tracing, this is where I'm:
>>>
>>> o Mainline + Oleg's optimization reverted:
>>>
>>>       ...
>>>       kworker/43:1-1723    [043] .....   115.309065: p9_read_work: Data read wait 55
>>>       kworker/43:1-1723    [043] .....   115.309066: p9_read_work: Data read 55
>>>       kworker/43:1-1723    [043] .....   115.309067: p9_read_work: Data read wait 7
>>>       kworker/43:1-1723    [043] .....   115.309068: p9_read_work: Data read 7
>>>              repro-4138    [043] .....   115.309084: netfs_wake_write_collector: Wake collector
>>>              repro-4138    [043] .....   115.309085: netfs_wake_write_collector: Queuing collector work
>>>              repro-4138    [043] .....   115.309088: netfs_unbuffered_write: netfs_unbuffered_write
>>>              repro-4138    [043] .....   115.309088: netfs_end_issue_write: netfs_end_issue_write
>>>              repro-4138    [043] .....   115.309089: netfs_end_issue_write: Write collector need poke 0
>>>              repro-4138    [043] .....   115.309091: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
>>>    kworker/u1030:1-1951    [168] .....   115.309096: netfs_wake_write_collector: Wake collector
>>>    kworker/u1030:1-1951    [168] .....   115.309097: netfs_wake_write_collector: Queuing collector work
>>>    kworker/u1030:1-1951    [168] .....   115.309102: netfs_write_collection_worker: Write collect clearing and waking up!
>>>       ... (syzbot reproducer continues)
>>>
>>> o Mainline:
>>>
>>>      kworker/185:1-1767    [185] .....   109.485961: p9_read_work: Data read wait 7
>>>      kworker/185:1-1767    [185] .....   109.485962: p9_read_work: Data read 7
>>>      kworker/185:1-1767    [185] .....   109.485962: p9_read_work: Data read wait 55
>>>      kworker/185:1-1767    [185] .....   109.485963: p9_read_work: Data read 55
>>>              repro-4038    [185] .....   114.225717: netfs_wake_write_collector: Wake collector
>>>              repro-4038    [185] .....   114.225723: netfs_wake_write_collector: Queuing collector work
>>>              repro-4038    [185] .....   114.225727: netfs_unbuffered_write: netfs_unbuffered_write
>>>              repro-4038    [185] .....   114.225727: netfs_end_issue_write: netfs_end_issue_write
>>>              repro-4038    [185] .....   114.225728: netfs_end_issue_write: Write collector need poke 0
>>>              repro-4038    [185] .....   114.225728: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
>>>      ... (syzbot reproducer hangs)
>>>
>>> There is a third "kworker/u1030" component that never gets woken up for
>>> reasons currently unknown to me with Oleg's optimization. I'll keep
>>> digging.
>>>
>>
>> Thanks for the update.
>>
>> It is unclear to me if you checked, so I'm going to have to ask just
>> in case: when there is a hang, is there *anyone* stuck in pipe code
>> (and if so, where)?
>>
>> You can get the kernel to print stacks for all threads with sysrq:
>> echo t > /proc/sysrq-trigger
> 
> This dumps an insane amount of stuff on my console. Let me search if there
> is any reference to pipe somewhere in there.

Only pipe_read() and pipe_write() pairs I found were for:

[ 1043.618621] task:containerd      state:S stack:0     pid:3567  tgid:3475  ppid:1      task_flags:0x400040 flags:0x00000002
[ 1043.629673] Call Trace:
[ 1043.632133]  <TASK>
[ 1043.634252]  __schedule+0x436/0x1620
[ 1043.637832]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.642638]  ? syscall_exit_to_user_mode+0x51/0x1a0
[ 1043.647538]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.652347]  ? __smp_call_single_queue+0xc7/0x150
[ 1043.657072]  schedule+0x28/0x110
[ 1043.660320]  futex_wait_queue+0x69/0xa0
[ 1043.664176]  __futex_wait+0x13c/0x1b0
[ 1043.667863]  ? __pfx_futex_wake_mark+0x10/0x10
[ 1043.672325]  futex_wait+0x68/0x110
[ 1043.675747]  ? __x64_sys_futex+0x77/0x1d0
[ 1043.679775]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.684585]  ? pipe_write+0x42c/0x630                   <----------- here
[ 1043.688269]  ? copy_fpstate_to_sigframe+0x330/0x3d0
[ 1043.693167]  do_futex+0x13c/0x1d0
[ 1043.696501]  __x64_sys_futex+0x77/0x1d0
[ 1043.700359]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.705170]  ? vfs_write+0x376/0x420
[ 1043.708767]  do_syscall_64+0x6f/0x110
[ 1043.712449]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.717259]  ? ksys_write+0x90/0xe0
[ 1043.720768]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.725579]  ? syscall_exit_to_user_mode+0x51/0x1a0
[ 1043.730475]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.735288]  ? do_syscall_64+0x7b/0x110
[ 1043.739144]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.743953]  ? irqentry_exit_to_user_mode+0x2e/0x160
[ 1043.748935]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

and

[ 1048.980518] task:dockerd         state:S stack:0     pid:3591  tgid:3582  ppid:1      task_flags:0x400040 flags:0x00000002
[ 1048.991569] Call Trace:
[ 1048.994024]  <TASK>
[ 1048.996143]  __schedule+0x436/0x1620
[ 1048.999741]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.004554]  ? get_nohz_timer_target+0x2a/0x180
[ 1049.009101]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.013911]  ? timerqueue_add+0x6a/0xc0
[ 1049.017768]  schedule+0x28/0x110
[ 1049.021017]  schedule_hrtimeout_range_clock+0x78/0xd0
[ 1049.026088]  ? __pfx_hrtimer_wakeup+0x10/0x10
[ 1049.030470]  do_epoll_wait+0x666/0x7d0
[ 1049.034245]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.039053]  ? __pfx_ep_autoremove_wake_function+0x10/0x10
[ 1049.044556]  do_compat_epoll_pwait.part.0+0xc/0x70
[ 1049.049368]  __x64_sys_epoll_pwait+0x83/0x140
[ 1049.053746]  do_syscall_64+0x6f/0x110
[ 1049.057425]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.062235]  ? do_syscall_64+0x7b/0x110
[ 1049.066079]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.070887]  ? do_futex+0xc2/0x1d0
[ 1049.074309]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.079120]  ? __x64_sys_futex+0x77/0x1d0
[ 1049.083151]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.087958]  ? current_time+0x31/0x130
[ 1049.091731]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.096539]  ? inode_update_timestamps+0xc8/0x110
[ 1049.101264]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.106074]  ? generic_update_time+0x51/0x60
[ 1049.110365]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.115173]  ? touch_atime+0xb5/0x100
[ 1049.118857]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.123671]  ? pipe_read+0x3fe/0x480                   <----------- here
[ 1049.127265]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.132073]  ? ep_done_scan+0xab/0xf0
[ 1049.135757]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.140567]  ? do_epoll_wait+0xe1/0x7d0
[ 1049.144425]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.149234]  ? do_compat_epoll_pwait.part.0+0xc/0x70
[ 1049.154218]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.159027]  ? __x64_sys_epoll_pwait+0x83/0x140
[ 1049.163577]  ? do_syscall_64+0x7b/0x110
[ 1049.167433]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.172243]  ? syscall_exit_to_user_mode+0x51/0x1a0
[ 1049.177141]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.181951]  ? do_syscall_64+0x7b/0x110
[ 1049.185811]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.190619]  ? ksys_read+0x90/0xe0
[ 1049.194025]  ? do_syscall_64+0x7b/0x110
[ 1049.197877]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.202682]  ? syscall_exit_to_user_mode+0x51/0x1a0
[ 1049.207579]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.212391]  ? do_syscall_64+0x7b/0x110
[ 1049.216243]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.221053]  ? do_syscall_64+0x7b/0x110
[ 1049.224900]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.229703]  ? do_syscall_64+0x7b/0x110
[ 1049.233561]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ