linux-kernel - Re: [V9fs-developer] [PATCH 0/5] Take 3 at async RPCs and no longer looping forever on signals

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1792249.25CL7UOqfo@silver>
Date:   Sat, 25 Mar 2023 13:45:44 +0100
From:   Christian Schoenebeck <linux_oss@...debyte.com>
To:     Dominique Martinet <asmadeus@...ewreck.org>
Cc:     Jens Axboe <axboe@...nel.dk>, Latchesar Ionkov <lucho@...kov.net>,
        Eric Van Hensbergen <ericvh@...il.com>,
        linux-kernel@...r.kernel.org, Pengfei Xu <pengfei.xu@...el.com>,
        v9fs-developer@...ts.sourceforge.net
Subject: Re: [V9fs-developer] [PATCH 0/5] Take 3 at async RPCs and no longer looping
 forever on signals

On Thursday, March 23, 2023 4:58:51 PM CET Christian Schoenebeck wrote:
> On Sunday, March 19, 2023 12:53:46 PM CET Dominique Martinet wrote:
> > It's been a while but I didn't forget...
> > 
> > Dominique Martinet wrote on Tue, Feb 14, 2023 at 08:16:38PM +0900:
> > > > Yes, apparently it tries to write dirty pages of the mapped file and keeps
> > > > hanging there [fs/9p/vfs_inode_dotl.c:586]:
> > > 
> > > Yeah, it'd help to get the trace of the thread actually trying to do the
> > > IO, if it still exists.
> > > I had some hangs in the check that there are no flush in flight at some
> > > point, and I thought I fixed that, but I can't really see anywhere else
> > > that'd start hanging with this... it'll be clearer if I can reproduce.
> > 
> > I couldn't reproduce this one, but manually inspecting
> > p9_client_wait_flush again I noticed the wait_event_interruptible was
> > waiting on req->flushed_req->wq but looking at req->status in the
> > condition; that was an error.
> > Also, we have a ref on req->flushed_req but not on req, so
> > req->flushed_req wasn't safe.
> > 
> > I've changed the code to add a variable directly on req->flushed_req and
> > use it consistently, I'm not sure that's the problem you ran into but it
> > might help.
> > It's been a while but do you remember if that hang was consistently
> > happening on shutdown, or was it a one time thing?
> > 
> > Either way, I'd appreciate if you could try my 9p-test branch again:
> > https://github.com/martinetd/linux/commits/9p-test
> > 
> > 
> > With that said, I expect that p9_client_wait_req will cause hangs on
> > broken servers.
> > If connection drops hopefully the reqs will just be marked as error and
> > free the thread, but I can see syzbot complaining about yet another
> > thread stuck.. Well it's interruptible at least, and bails out on
> > ERESTARTSYS.
> 
> I gave your current test branch some spins today.
> 
> And yes, the problem did not happen reliably each time, but consistently
> enough for me to reproduce it.
> 
> With your latest test branch it appears to have mitigated the problem. Once in
> a while on shutdown I see it blocking for few minutes, but recovering:
> 
> [**    ] (2 of 3) A stop job is running for …ave Random Seed (2min 36s / 10min)
> [  484.986388] INFO: task systemd-user-ru:8782 blocked for more than 120 seconds.
> [  484.990150]       Tainted: G            E      6.3.0-rc2+ #63
> [  484.992553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  484.993657] task:systemd-user-ru state:D stack:0     pid:8782  ppid:1      flags:0x00000004
> [  484.994863] Call Trace:
> [  484.995398]  <TASK>
> [  484.995866] __schedule (kernel/sched/core.c:5304 kernel/sched/core.c:6622) 
> [  484.996614] schedule (kernel/sched/core.c:6699 (discriminator 1)) 
> [  484.997283] d_alloc_parallel (./include/linux/spinlock.h:350 fs/dcache.c:2626 fs/dcache.c:2707) 
> [  484.998150] ? __pfx_default_wake_function (kernel/sched/core.c:6944) 
> [  484.999213] __lookup_slow (fs/namei.c:1671) 
> [  485.000006] walk_component (./include/linux/fs.h:773 fs/namei.c:1704 fs/namei.c:1994) 
> [  485.000805] path_lookupat (fs/namei.c:2451 fs/namei.c:2475) 
> [  485.001594] filename_lookup (fs/namei.c:2504) 
> [  485.002452] ? __check_object_size (mm/usercopy.c:196 mm/usercopy.c:251 mm/usercopy.c:213) 
> [  485.003523] ? strncpy_from_user (./include/asm-generic/access_ok.h:40 ./arch/x86/include/asm/uaccess.h:551 lib/strncpy_from_user.c:138) 
> [  485.004537] user_path_at_empty (fs/namei.c:2879) 
> [  485.005508] do_faccessat (fs/open.c:484) 
> [  485.006410] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [  485.007281] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 
> [  485.008495] RIP: 0033:0x7f7b4527d8f7
> [  485.009380] RSP: 002b:00007ffc9ee73c08 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> [  485.011118] RAX: ffffffffffffffda RBX: 00007f7b44ee2608 RCX: 00007f7b4527d8f7
> [  485.012831] RDX: 000000000000001c RSI: 0000000000000000 RDI: 00007f7b44eda86b
> [  485.014560] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007f7b45360be0
> [  485.016246] R10: 00005571ade756c0 R11: 0000000000000246 R12: 00007ffc9ee73cf8
> [  485.017937] R13: 00007ffc9ee73d18 R14: 00007f7b44ee2610 R15: 0000000000000000
> [  485.019669]  </TASK>
> 
> However that happens now before unmount rule is reached on shutdown?
> Previously it was hanging after the shutdown rule was reached. Does this make
> sense?

Correction: I still get the previous issue on unmount with your latest version
as well. :/

Call trace looks slightly different, but it seems to be the same issue:

[  OK  ] Reached target Unmount All Filesystems.
[*     ] A stop job is running for Load/Save Random Seed (3min 27s / 10min)
[  243.031668] INFO: task systemd-journal:199 blocked for more than 120 seconds.
[  243.036460]       Tainted: G            E      6.3.0-rc2+ #63
[  243.037666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.039004] task:systemd-journal state:D stack:0     pid:199   ppid:1      flags:0x00004004
[  243.040434] Call Trace:
[  243.040878]  <TASK>
[  243.041265] __schedule (kernel/sched/core.c:5304 kernel/sched/core.c:6622) 
[  243.041887] ? __pagevec_release (./include/linux/pagevec.h:38 mm/swap.c:1063) 
[  243.042594] schedule (kernel/sched/core.c:6699 (discriminator 1)) 
[  243.043150] io_schedule (kernel/sched/core.c:8859 kernel/sched/core.c:8885) 
[  243.043751] folio_wait_bit_common (mm/filemap.c:1293) 
[  243.044519] ? __pfx_wake_page_function (mm/filemap.c:1083) 
[  243.045321] folio_wait_writeback (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/page-flags.h:518 mm/page-writeback.c:3125) 
[  243.046043] __filemap_fdatawait_range (./arch/x86/include/asm/bitops.h:75 (discriminator 3) ./include/asm-generic/bitops/instrumented-atomic.h:42 (discriminator 3) ./include/linux/page-flags.h:473 (discriminator 3) mm/filemap.c:526 (discriminator 3)) 
[  243.046828] ? filemap_fdatawrite_wbc (mm/filemap.c:393) 
[  243.047622] ? __filemap_fdatawrite_range (mm/filemap.c:424) 
[  243.048531] filemap_write_and_wait_range (mm/filemap.c:689 mm/filemap.c:669) 
[  243.049358] v9fs_vfs_setattr_dotl (./include/linux/pagemap.h:60 fs/9p/vfs_inode_dotl.c:587) 9p
[  243.050229] notify_change (fs/attr.c:486) 
[  243.050924] ? __vfs_getxattr (fs/xattr.c:426) 
[  243.051571] ? do_truncate (./include/linux/fs.h:763 fs/open.c:67) 
[  243.052224] do_truncate (./include/linux/fs.h:763 fs/open.c:67) 
[  243.052816] do_sys_ftruncate (fs/open.c:194) 
[  243.053548] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[  243.054187] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 
[  243.055057] RIP: 0033:0x7f6db123ab77
[  243.055706] RSP: 002b:00007ffc4dda0ca8 EFLAGS: 00000206 ORIG_RAX: 000000000000004d
[  243.056978] RAX: ffffffffffffffda RBX: 00007ffc4dda0cf0 RCX: 00007f6db123ab77
[  243.058198] RDX: 000055715f9b2a00 RSI: 0000000001000000 RDI: 0000000000000018
[  243.059478] RBP: 000055715f9b2bf0 R08: 0000000000000001 R09: 000055715f9b2c74
[  243.060716] R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffc4dda0ce8

I just realized I don't have to do anything fancy to reproduce this. I just
have to boot and shutdown guest few times to trigger it.