lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAD+ocbw_xfWD1jLfSAd6aue+g1qxSqr1q7OY3+E9Z+3s+zZzoQ@mail.gmail.com>
Date:   Sun, 19 Dec 2021 19:19:07 -0800
From:   harshad shirwadkar <harshadshirwadkar@...il.com>
To:     Xin Yin <yinxin.x@...edance.com>
Cc:     Eric Whitney <enwlinux@...il.com>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [External] Re: generic/083 triggers hang on 5.16-rc5 w/fast_commit

On Sun, Dec 19, 2021 at 7:16 PM Xin Yin <yinxin.x@...edance.com> wrote:
>
> Thanks for Eric's report. After looking at these stacks, I think there
> is one scenario that may cause this issue.
> 1. Task A call ext4_fc_start_update().
> 2. Task B starts a fast commit and waits for inode update complete.
> 3. Task A call ext4_should_retry_alloc() wake up jbd2 and wait for a
> full commit complete.
> 4. jbd2 waits for a fast commit complete.
>
> so, I think maybe we should make
> ext4_fc_start_update()/ext4_fc_stop_update() more fine grained now, as
> listed in the TODOs

..and I just sent out a patch series to do that :)

- Harshad

>  * TODOs
>  * -----
>  * 1) Make fast commit atomic updates more fine grained. Today, a fast commit
>  *    eligible update must be protected within ext4_fc_start_update() and
>  *    ext4_fc_stop_update(). These routines are called at much higher
>  *    routines. This can be made more fine grained by combining with
>  *    ext4_journal_start().
>
> Thanks,
> Xin Yin
>
>
>
> On Sat, Dec 18, 2021 at 7:50 AM harshad shirwadkar
> <harshadshirwadkar@...il.com> wrote:
> >
> > The problem seems to be during "ext4_should_retry_alloc()" which can
> > call journal commit in certain cases. There are multiple callers to
> > this function and all need to ensure that fast_commit transactions
> > have been stopped before calling this function.
> >
> > Thinking more generally about this, these problems are occurring
> > because we have a different "transaction start" APIs for full commits
> > and fast commits. For full commits, we use
> > "jbd2__handle_start()/stop()" while for fast commits we have
> > "ext4_fc_start_update()" and "ext4_fc_stop_update()". Maintaining
> > these two different APIs is bound to cause problems in the long run.
> > Jan had pointed it out during the initial review and I think it's
> > better to fix it sooner rather than later. So what I'm considering in
> > terms of API blending is to maintain an ext4 global counter for
> > currently open handles and block fast commits if that counter is non
> > zero. This change would make ext4_journal_start block if a fast commit
> > is ongoing. IIUC, it does get blocked if a full commit is ongoing
> > already right?
> >
> > On Fri, Dec 17, 2021 at 12:10 PM harshad shirwadkar
> > <harshadshirwadkar@...il.com> wrote:
> > >
> > > Thanks Eric for testing with Xin's patch. I also observed that the
> > > patch didn't fix it.
> > >
> > > I have found one root cause after which the probability of failure
> > > reduces drastically (it takes over 200-300 runs to reproduce the
> > > issue). I'm looking for other places where something might be wrong.
> > > But one definite issue is that in ext4_buffered_write_iter() we start
> > > a fast commit transaction and if for some reason the write fails, we
> > > try to perform a commit (hoping that after the commit write would
> > > succeed). However, that commit waits for the opened fast commit
> > > transaction to finish, causing the thread to hang. I should be
> > > stopping the transaction before starting the commit. I'll send out the
> > > patch once I find other issues too.
> > >
> > > Thanks,
> > > Harshad
> > >
> > > On Thu, Dec 16, 2021 at 1:36 PM Eric Whitney <enwlinux@...il.com> wrote:
> > > >
> > > > * Eric Whitney <enwlinux@...il.com>:
> > > > > I'm observing kernel hangs when running generic/083 using the latest version
> > > > > of the xfstests-bld test appliance on 5.16-rc5 with the adv test case.  The
> > > > > hangs typically occur once in 16 or 17 trials.  This is not a regression in
> > > > > 5.16, as it's reproducible in 5.15.  Modifying the adv test case to exclude
> > > > > the inline_data feature and then all features results in 100 passed tests
> > > > > out of 100 trials in each case.  Modifying adv to include only the fast_commit
> > > > > feature leads to hangs at the same rate as described above.  Output from a hung
> > > > > test run follows below.
> > > > >
> > > > > Thanks,
> > > > > Eric
> > > > >
> > > > > generic/083 5s ...    [16:09:08][   24.909037] run fstests generic/083 at 2028
> > > > > [   25.266112] EXT4-fs (vdc): mounted filesystem with ordered data mode. Opts: .
> > > > > [  245.414394] INFO: task jbd2/vdc-8:4436 blocked for more than 122 seconds.
> > > > > [  245.418977]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.421909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.425207] task:jbd2/vdc-8      state:D stack:    0 pid: 4436 ppid:     2 f0
> > > > > [  245.426372] Call Trace:
> > > > > [  245.426777]  <TASK>
> > > > > [  245.427130]  __schedule+0x309/0xab0
> > > > > [  245.427722]  ? lock_release+0x132/0x2a0
> > > > > [  245.428338]  ? wait_woken+0x60/0x60
> > > > > [  245.428899]  schedule+0x44/0xc0
> > > > > [  245.429407]  jbd2_journal_commit_transaction+0x174/0x1e40
> > > > > [  245.430276]  ? find_held_lock+0x2d/0x90
> > > > > [  245.430920]  ? try_to_del_timer_sync+0x4d/0x80
> > > > > [  245.431622]  ? wait_woken+0x60/0x60
> > > > > [  245.432189]  ? _raw_spin_unlock_irqrestore+0x3b/0x50
> > > > > [  245.432982]  ? try_to_del_timer_sync+0x4d/0x80
> > > > > [  245.433692]  kjournald2+0xcc/0x2a0
> > > > > [  245.434263]  ? wait_woken+0x60/0x60
> > > > > [  245.434820]  ? commit_timeout+0x10/0x10
> > > > > [  245.435435]  kthread+0x164/0x190
> > > > > [  245.435957]  ? set_kthread_struct+0x40/0x40
> > > > > [  245.436630]  ret_from_fork+0x22/0x30
> > > > > [  245.437218]  </TASK>
> > > > > [  245.437602] INFO: task fsstress:4440 blocked for more than 122 seconds.
> > > > > [  245.438643]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.439306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.440513] task:fsstress        state:D stack:    0 pid: 4440 ppid:  4439 f0
> > > > > [  245.441857] Call Trace:
> > > > > [  245.442240]  <TASK>
> > > > > [  245.442530]  __schedule+0x309/0xab0
> > > > > [  245.442903]  schedule+0x44/0xc0
> > > > > [  245.443237]  jbd2_log_wait_commit+0x119/0x190
> > > > > [  245.444011]  ? wait_woken+0x60/0x60
> > > > > [  245.444616]  __jbd2_journal_force_commit+0x5d/0xb0
> > > > > [  245.445374]  jbd2_journal_force_commit_nested+0xa/0x20
> > > > > [  245.446181]  ext4_should_retry_alloc+0x5c/0xb0
> > > > > [  245.446882]  ext4_da_write_begin+0xf2/0x310
> > > > > [  245.447572]  generic_perform_write+0xf0/0x1d0
> > > > > [  245.448258]  ? generic_update_time+0xa0/0xc0
> > > > > [  245.448935]  ? file_update_time+0xd4/0x110
> > > > > [  245.449582]  ext4_buffered_write_iter+0x88/0x120
> > > > > [  245.450314]  ext4_file_write_iter+0x5f/0x8a0
> > > > > [  245.451011]  ? lock_is_held_type+0xd8/0x130
> > > > > [  245.451662]  ? find_held_lock+0x2d/0x90
> > > > > [  245.452275]  do_iter_readv_writev+0x169/0x1d0
> > > > > [  245.452972]  do_iter_write+0x83/0x1c0
> > > > > [  245.453554]  iter_file_splice_write+0x265/0x390
> > > > > [  245.454298]  direct_splice_actor+0x31/0x40
> > > > > [  245.454934]  splice_direct_to_actor+0xfb/0x220
> > > > > [  245.455640]  ? pipe_to_sendpage+0xa0/0xa0
> > > > > [  245.456279]  do_splice_direct+0xa0/0xd0
> > > > > [  245.456895]  vfs_copy_file_range+0x1b6/0x5b0
> > > > > [  245.457590]  __x64_sys_copy_file_range+0xdd/0x200
> > > > > [  245.458325]  do_syscall_64+0x3a/0x80
> > > > > [  245.458891]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.459682] RIP: 0033:0x7f9419fb5f59
> > > > > [  245.460247] RSP: 002b:00007ffdf52d2428 EFLAGS: 00000246 ORIG_RAX: 00000000006
> > > > > [  245.461444] RAX: ffffffffffffffda RBX: 00007ffdf52d2478 RCX: 00007f9419fb5f59
> > > > > [  245.462537] RDX: 0000000000000004 RSI: 00007ffdf52d2470 RDI: 0000000000000003
> > > > > [  245.463640] RBP: 0000000000019f46 R08: 0000000000019f46 R09: 0000000000000000
> > > > > [  245.464770] R10: 00007ffdf52d2478 R11: 0000000000000246 R12: 0000000000000004
> > > > > [  245.465876] R13: 00007ffdf52d2470 R14: 0000000000000003 R15: 000000000009858f
> > > > > [  245.466989]  </TASK>
> > > > > [  245.467343] INFO: task fsstress:4441 blocked for more than 122 seconds.
> > > > > [  245.468383]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.469031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.470238] task:fsstress        state:D stack:    0 pid: 4441 ppid:  4439 f0
> > > > > [  245.471436] Call Trace:
> > > > > [  245.471681]  <TASK>
> > > > > [  245.471924]  __schedule+0x309/0xab0
> > > > > [  245.472683]  schedule+0x44/0xc0
> > > > > [  245.473181]  jbd2_log_wait_commit+0x119/0x190
> > > > > [  245.473869]  ? wait_woken+0x60/0x60
> > > > > [  245.474448]  __jbd2_journal_force_commit+0x5d/0xb0
> > > > > [  245.475197]  jbd2_journal_force_commit_nested+0xa/0x20
> > > > > [  245.476004]  ext4_should_retry_alloc+0x5c/0xb0
> > > > > [  245.476709]  ext4_write_begin+0x168/0x520
> > > > > [  245.477352]  __page_symlink+0xbe/0x110
> > > > > [  245.477977]  ext4_symlink+0x1db/0x3d0
> > > > > [  245.478561]  vfs_symlink+0x109/0x1b0
> > > > > [  245.479124]  do_symlinkat+0xde/0xf0
> > > > > [  245.479685]  __x64_sys_symlink+0x37/0x40
> > > > > [  245.480306]  do_syscall_64+0x3a/0x80
> > > > > [  245.480903]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.481677] RIP: 0033:0x7f9419fadf07
> > > > > [  245.482266] RSP: 002b:00007ffdf52d2c58 EFLAGS: 00000206 ORIG_RAX: 00000000008
> > > > > [  245.483436] RAX: ffffffffffffffda RBX: 000000000000023d RCX: 00007f9419fadf07
> > > > > [  245.484573] RDX: 0000000000000064 RSI: 00005595aed07d60 RDI: 00005595aed0bd00
> > > > > [  245.485680] RBP: 00005595aed0bd00 R08: 0000000000000004 R09: 00005595aecf3980
> > > > > [  245.486774] R10: 0000000000000006 R11: 0000000000000206 R12: 00005595aed07d60
> > > > > [  245.487901] R13: 00007ffdf52d2dc0 R14: 00005595aed0bd00 R15: 00005595ad3dd450
> > > > > [  245.489012]  </TASK>
> > > > > [  245.489358] INFO: task fsstress:4442 blocked for more than 122 seconds.
> > > > > [  245.490390]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.491059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.492255] task:fsstress        state:D stack:    0 pid: 4442 ppid:  4439 f0
> > > > > [  245.493541] Call Trace:
> > > > > [  245.493933]  <TASK>
> > > > > [  245.494298]  __schedule+0x309/0xab0
> > > > > [  245.494851]  schedule+0x44/0xc0
> > > > > [  245.495355]  jbd2_log_wait_commit+0x119/0x190
> > > > > [  245.496038]  ? wait_woken+0x60/0x60
> > > > > [  245.496604]  __jbd2_journal_force_commit+0x5d/0xb0
> > > > > [  245.497358]  jbd2_journal_force_commit_nested+0xa/0x20
> > > > > [  245.498175]  ext4_should_retry_alloc+0x5c/0xb0
> > > > > [  245.498865]  ext4_iomap_begin+0x193/0x2b0
> > > > > [  245.499499]  iomap_iter+0x184/0x4a0
> > > > > [  245.500053]  __iomap_dio_rw+0x249/0x7b0
> > > > > [  245.500686]  iomap_dio_rw+0xa/0x30
> > > > > [  245.501203]  ext4_file_write_iter+0x421/0x8a0
> > > > > [  245.501840]  new_sync_write+0x125/0x1c0
> > > > > [  245.502355]  vfs_write+0x20f/0x370
> > > > > [  245.502757]  ksys_write+0x5f/0xe0
> > > > > [  245.503120]  do_syscall_64+0x3a/0x80
> > > > > [  245.503472]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.503956] RIP: 0033:0x7f941a094471
> > > > > [  245.504328] RSP: 002b:00007ffdf52d2908 EFLAGS: 00000246 ORIG_RAX: 00000000001
> > > > > [  245.505399] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f941a094471
> > > > > [  245.506509] RDX: 000000000000c000 RSI: 00005595aed08000 RDI: 0000000000000003
> > > > > [  245.507663] RBP: 000000000018f000 R08: 0000000000000b4f R09: 00007f941a07e0d0
> > > > > [  245.508776] R10: 00005595aecf2010 R11: 0000000000000246 R12: 00000000000003c7
> > > > > [  245.509888] R13: 000000000000c000 R14: 00005595aed08000 R15: 0000000000000000
> > > > > [  245.511041]  </TASK>
> > > > > [  245.511398] INFO: task fsstress:4443 blocked for more than 122 seconds.
> > > > > [  245.512430]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.513103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.514354] task:fsstress        state:D stack:    0 pid: 4443 ppid:  4439 f0
> > > > > [  245.515652] Call Trace:
> > > > > [  245.516054]  <TASK>
> > > > > [  245.516408]  __schedule+0x309/0xab0
> > > > > [  245.516976]  schedule+0x44/0xc0
> > > > > [  245.517483]  jbd2_log_wait_commit+0x119/0x190
> > > > > [  245.518199]  ? wait_woken+0x60/0x60
> > > > > [  245.518758]  __jbd2_journal_force_commit+0x5d/0xb0
> > > > > [  245.519520]  jbd2_journal_force_commit_nested+0xa/0x20
> > > > > [  245.520330]  ext4_should_retry_alloc+0x5c/0xb0
> > > > > [  245.521065]  ext4_da_write_begin+0xf2/0x310
> > > > > [  245.521736]  generic_perform_write+0xf0/0x1d0
> > > > > [  245.522425]  ? generic_update_time+0xa0/0xc0
> > > > > [  245.523102]  ? file_update_time+0xd4/0x110
> > > > > [  245.523759]  ext4_buffered_write_iter+0x88/0x120
> > > > > [  245.524521]  ext4_file_write_iter+0x5f/0x8a0
> > > > > [  245.525197]  ? lock_is_held_type+0xd8/0x130
> > > > > [  245.525875]  do_iter_readv_writev+0x169/0x1d0
> > > > > [  245.526568]  do_iter_write+0x83/0x1c0
> > > > > [  245.527155]  vfs_writev+0x9c/0x280
> > > > > [  245.527725]  ? lock_is_held_type+0xd8/0x130
> > > > > [  245.528388]  ? find_held_lock+0x2d/0x90
> > > > > [  245.528911]  ? do_writev+0x6b/0x110
> > > > > [  245.529443]  ? syscall_enter_from_user_mode+0x20/0x70
> > > > > [  245.530182]  do_writev+0x6b/0x110
> > > > > [  245.530674]  do_syscall_64+0x3a/0x80
> > > > > [  245.531219]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.531978] RIP: 0033:0x7f9419fb2504
> > > > > [  245.532498] RSP: 002b:00007ffdf52d28f8 EFLAGS: 00000246 ORIG_RAX: 00000000004
> > > > > [  245.533817] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f9419fb2504
> > > > > [  245.534870] RDX: 00000000000000d9 RSI: 00005595aed072f0 RDI: 0000000000000003
> > > > > [  245.535915] RBP: 000000000000040a R08: 00005595aed7cf55 R09: 00007f941a07e3a0
> > > > > [  245.536957] R10: 00005595aecf2010 R11: 0000000000000246 R12: 00005595aed072f0
> > > > > [  245.538003] R13: 00005595aed61250 R14: 00000000000000d9 R15: 000000000000020d
> > > > > [  245.539051]  </TASK>
> > > > > [  245.539382] INFO: task fsstress:4444 blocked for more than 123 seconds.
> > > > > [  245.540346]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.541007] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.542237] task:fsstress        state:D stack:    0 pid: 4444 ppid:  4439 f0
> > > > > [  245.543252] Call Trace:
> > > > > [  245.543555]  <TASK>
> > > > > [  245.543821]  __schedule+0x309/0xab0
> > > > > [  245.544294]  schedule+0x44/0xc0
> > > > > [  245.544779]  jbd2_log_wait_commit+0x119/0x190
> > > > > [  245.545203]  ? wait_woken+0x60/0x60
> > > > > [  245.545546]  __jbd2_journal_force_commit+0x5d/0xb0
> > > > > [  245.546009]  jbd2_journal_force_commit_nested+0xa/0x20
> > > > > [  245.546501]  ext4_should_retry_alloc+0x5c/0xb0
> > > > > [  245.546929]  ext4_da_write_begin+0xf2/0x310
> > > > > [  245.547337]  generic_perform_write+0xf0/0x1d0
> > > > > [  245.547797]  ext4_buffered_write_iter+0x88/0x120
> > > > > [  245.548398]  ext4_file_write_iter+0x5f/0x8a0
> > > > > [  245.548998]  new_sync_write+0x125/0x1c0
> > > > > [  245.549568]  vfs_write+0x20f/0x370
> > > > > [  245.550070]  ksys_write+0x5f/0xe0
> > > > > [  245.550560]  do_syscall_64+0x3a/0x80
> > > > > [  245.551109]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.551857] RIP: 0033:0x7f941a094471
> > > > > [  245.552384] RSP: 002b:00007ffdf52d2908 EFLAGS: 00000246 ORIG_RAX: 00000000001
> > > > > [  245.553480] RAX: ffffffffffffffda RBX: 00000000000184da RCX: 00007f941a094471
> > > > > [  245.554527] RDX: 00000000000184da RSI: 00005595aedc43c0 RDI: 0000000000000003
> > > > > [  245.555564] RBP: 0000000000000311 R08: 0000000000000000 R09: 00007f941a07dcd0
> > > > > [  245.556601] R10: 00005595aecf2010 R11: 0000000000000246 R12: 0000000000000003
> > > > > [  245.557648] R13: 00000000001d8893 R14: 00005595aedc43c0 R15: 0000000000000000
> > > > > [  245.558694]  </TASK>
> > > > > [  245.559021] INFO: task fsstress:4445 blocked for more than 123 seconds.
> > > > > [  245.559977]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.560591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.561733] task:fsstress        state:D stack:    0 pid: 4445 ppid:  4439 f0
> > > > > [  245.562945] Call Trace:
> > > > > [  245.563491]  <TASK>
> > > > > [  245.563887]  __schedule+0x309/0xab0
> > > > > [  245.564427]  schedule+0x44/0xc0
> > > > > [  245.564905]  jbd2_log_wait_commit+0x119/0x190
> > > > > [  245.565542]  ? wait_woken+0x60/0x60
> > > > > [  245.566056]  __jbd2_journal_force_commit+0x5d/0xb0
> > > > > [  245.566750]  jbd2_journal_force_commit_nested+0xa/0x20
> > > > > [  245.567494]  ext4_should_retry_alloc+0x5c/0xb0
> > > > > [  245.568156]  ext4_da_write_begin+0xf2/0x310
> > > > > [  245.568788]  generic_perform_write+0xf0/0x1d0
> > > > > [  245.569429]  ext4_buffered_write_iter+0x88/0x120
> > > > > [  245.570102]  ext4_file_write_iter+0x5f/0x8a0
> > > > > [  245.570730]  new_sync_write+0x125/0x1c0
> > > > > [  245.571313]  vfs_write+0x20f/0x370
> > > > > [  245.571821]  ksys_write+0x5f/0xe0
> > > > > [  245.572311]  do_syscall_64+0x3a/0x80
> > > > > [  245.572842]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.573573] RIP: 0033:0x7f941a094471
> > > > > [  245.574097] RSP: 002b:00007ffdf52d2908 EFLAGS: 00000246 ORIG_RAX: 00000000001
> > > > > [  245.575199] RAX: ffffffffffffffda RBX: 0000000000019da9 RCX: 00007f941a094471
> > > > > [  245.576239] RDX: 0000000000019da9 RSI: 00005595aedc23f0 RDI: 0000000000000003
> > > > > [  245.577271] RBP: 0000000000000359 R08: 0000000000000003 R09: 00007f941a07dd60
> > > > > [  245.578319] R10: 00005595aecf2010 R11: 0000000000000246 R12: 0000000000000003
> > > > > [  245.579357] R13: 0000000000147e54 R14: 00005595aedc23f0 R15: 0000000000000000
> > > > > [  245.580395]  </TASK>
> > > > > [  245.580729] INFO: task fsstress:4446 blocked for more than 123 seconds.
> > > > > [  245.581693]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.582344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.583552] task:fsstress        state:D stack:    0 pid: 4446 ppid:  4439 f0
> > > > > [  245.584844] Call Trace:
> > > > > [  245.585145]  <TASK>
> > > > > [  245.585491]  __schedule+0x309/0xab0
> > > > > [  245.586040]  ? lock_release+0x132/0x2a0
> > > > > [  245.586560]  schedule+0x44/0xc0
> > > > > [  245.586866]  ext4_fc_commit+0x447/0xa90
> > > > > [  245.587231]  ? jbd2_trans_will_send_data_barrier+0x7f/0xa0
> > > > > [  245.587765]  ? wait_woken+0x60/0x60
> > > > > [  245.588103]  ext4_sync_file+0x3fc/0x470
> > > > > [  245.588466]  do_fsync+0x38/0x70
> > > > > [  245.588917]  __x64_sys_fdatasync+0x13/0x20
> > > > > [  245.589517]  do_syscall_64+0x3a/0x80
> > > > > [  245.590042]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.590774] RIP: 0033:0x7f9419fb32c4
> > > > > [  245.591317] RSP: 002b:00007ffdf52d2db8 EFLAGS: 00000246 ORIG_RAX: 0000000000b
> > > > > [  245.592439] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f9419fb32c4
> > > > > [  245.593465] RDX: 00007ffdf52d2d20 RSI: 00007ffdf52d2d20 RDI: 0000000000000003
> > > > > [  245.594762] RBP: 00000000000003df R08: 00007f941a07dc40 R09: 00007ffdf52d2a06
> > > > > [  245.595861] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000002ee
> > > > > [  245.596968] R13: 0000000051eb851f R14: 00007ffdf52d2e60 R15: 00005595ad3d85f0
> > > > > [  245.598100]  </TASK>
> > > > > [  245.598364] INFO: task fsstress:4447 blocked for more than 123 seconds.
> > > > > [  245.599382]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.600039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.601273] task:fsstress        state:D stack:    0 pid: 4447 ppid:  4439 f0
> > > > > [  245.602406] Call Trace:
> > > > > [  245.602829]  <TASK>
> > > > > [  245.603191]  __schedule+0x309/0xab0
> > > > > [  245.603788]  schedule+0x44/0xc0
> > > > > [  245.604324]  jbd2_log_wait_commit+0x119/0x190
> > > > > [  245.604897]  ? wait_woken+0x60/0x60
> > > > > [  245.605493]  __jbd2_journal_force_commit+0x5d/0xb0
> > > > > [  245.606290]  jbd2_journal_force_commit_nested+0xa/0x20
> > > > > [  245.607136]  ext4_should_retry_alloc+0x5c/0xb0
> > > > > [  245.607885]  ext4_da_write_begin+0xf2/0x310
> > > > > [  245.608423]  generic_perform_write+0xf0/0x1d0
> > > > > [  245.609148]  ? generic_update_time+0xa0/0xc0
> > > > > [  245.609861]  ? file_update_time+0xd4/0x110
> > > > > [  245.610527]  ext4_buffered_write_iter+0x88/0x120
> > > > > [  245.611296]  ext4_file_write_iter+0x5f/0x8a0
> > > > > [  245.611840]  ? lock_is_held_type+0xd8/0x130
> > > > > [  245.612470]  ? find_held_lock+0x2d/0x90
> > > > > [  245.613055]  do_iter_readv_writev+0x169/0x1d0
> > > > > [  245.613709]  do_iter_write+0x83/0x1c0
> > > > > [  245.614272]  iter_file_splice_write+0x265/0x390
> > > > > [  245.615015]  direct_splice_actor+0x31/0x40
> > > > > [  245.615676]  splice_direct_to_actor+0xfb/0x220
> > > > > [  245.616390]  ? pipe_to_sendpage+0xa0/0xa0
> > > > > [  245.617038]  do_splice_direct+0xa0/0xd0
> > > > > [  245.617684]  vfs_copy_file_range+0x1b6/0x5b0
> > > > > [  245.618338]  __x64_sys_copy_file_range+0xdd/0x200
> > > > > [  245.619026]  do_syscall_64+0x3a/0x80
> > > > > [  245.619552]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.620283] RIP: 0033:0x7f9419fb5f59
> > > > > [  245.620811] RSP: 002b:00007ffdf52d2428 EFLAGS: 00000246 ORIG_RAX: 00000000006
> > > > > [  245.621907] RAX: ffffffffffffffda RBX: 00007ffdf52d2478 RCX: 00007f9419fb5f59
> > > > > [  245.622986] RDX: 0000000000000004 RSI: 00007ffdf52d2470 RDI: 0000000000000003
> > > > > [  245.623983] RBP: 000000000001ddb5 R08: 000000000001ddb5 R09: 0000000000000000
> > > > > [  245.625033] R10: 00007ffdf52d2478 R11: 0000000000000246 R12: 0000000000000004
> > > > > [  245.626162] R13: 00007ffdf52d2470 R14: 0000000000000003 R15: 000000000030e49f
> > > > > [  245.627282]  </TASK>
> > > > > [  245.627654] INFO: task fsstress:4448 blocked for more than 123 seconds.
> > > > > [  245.628601]       Not tainted 5.16.0-rc5 #1
> > > > > [  245.629266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
> > > > > [  245.630484] task:fsstress        state:D stack:    0 pid: 4448 ppid:  4439 f0
> > > > > [  245.631788] Call Trace:
> > > > > [  245.632162]  <TASK>
> > > > > [  245.632495]  __schedule+0x309/0xab0
> > > > > [  245.632991]  schedule+0x44/0xc0
> > > > > [  245.633395]  jbd2_log_wait_commit+0x119/0x190
> > > > > [  245.633958]  ? wait_woken+0x60/0x60
> > > > > [  245.634465]  __jbd2_journal_force_commit+0x5d/0xb0
> > > > > [  245.635224]  jbd2_journal_force_commit_nested+0xa/0x20
> > > > > [  245.635889]  ext4_should_retry_alloc+0x5c/0xb0
> > > > > [  245.636464]  ext4_da_write_begin+0xf2/0x310
> > > > > [  245.637042]  generic_perform_write+0xf0/0x1d0
> > > > > [  245.637631]  ext4_buffered_write_iter+0x88/0x120
> > > > > [  245.638294]  ext4_file_write_iter+0x5f/0x8a0
> > > > > [  245.638847]  new_sync_write+0x125/0x1c0
> > > > > [  245.639349]  vfs_write+0x20f/0x370
> > > > > [  245.639787]  ksys_write+0x5f/0xe0
> > > > > [  245.640220]  do_syscall_64+0x3a/0x80
> > > > > [  245.640685]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  245.641351] RIP: 0033:0x7f941a094471
> > > > > [  245.641910] RSP: 002b:00007ffdf52d2908 EFLAGS: 00000246 ORIG_RAX: 00000000001
> > > > > [  245.642861] RAX: ffffffffffffffda RBX: 00000000000094f2 RCX: 00007f941a094471
> > > > > [  245.643751] RDX: 00000000000094f2 RSI: 00005595aecf55c0 RDI: 0000000000000003
> > > > > [  245.644686] RBP: 000000000000040e R08: 00005595aed4d330 R09: 00007f941a07dcc0
> > > > > [  245.645630] R10: 00005595aecf2010 R11: 0000000000000246 R12: 0000000000000003
> > > > > [  245.646501] R13: 00000000000dd6d8 R14: 00005595aecf55c0 R15: 0000000000000000
> > > > > [  245.647380]  </TASK>
> > > > > [  245.647684]
> > > > > [  245.647684] Showing all locks held in the system:
> > > > > [  245.648492] 1 lock held by khungtaskd/25:
> > > > > [  245.648992]  #0: ffffffff82762e40 (rcu_read_lock){....}-{1:2}, at: debug_shoa
> > > > > [  245.650094] 2 locks held by fsstress/4440:
> > > > > [  245.650600]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: __x64_sys_0
> > > > > [  245.651761]  #1: ffff888013cad850 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.653096] 2 locks held by fsstress/4441:
> > > > > [  245.653595]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: filename_c0
> > > > > [  245.654650]  #1: ffff88800750f470 (&type->i_mutex_dir_key#3/1){+.+.}-{3:3}, 0
> > > > > [  245.656111] 2 locks held by fsstress/4442:
> > > > > [  245.656655]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write0
> > > > > [  245.657707]  #1: ffff8880076f32d0 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.659084] 2 locks held by fsstress/4443:
> > > > > [  245.659618]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: do_writev+0
> > > > > [  245.660632]  #1: ffff888007686b10 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.661960] 2 locks held by fsstress/4444:
> > > > > [  245.662540]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write0
> > > > > [  245.663538]  #1: ffff8880178c96b0 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.664873] 2 locks held by fsstress/4445:
> > > > > [  245.665442]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write0
> > > > > [  245.666441]  #1: ffff88801a8da970 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.667806] 2 locks held by fsstress/4447:
> > > > > [  245.668389]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: __x64_sys_0
> > > > > [  245.669588]  #1: ffff88800777c590 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.670944] 2 locks held by fsstress/4448:
> > > > > [  245.671520]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write0
> > > > > [  245.672537]  #1: ffff888017ac2010 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.673888] 2 locks held by fsstress/4449:
> > > > > [  245.674455]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: ksys_fallo0
> > > > > [  245.675596]  #1: ffff88800747bc30 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.676839] 2 locks held by fsstress/4450:
> > > > > [  245.677356]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: do_writev+0
> > > > > [  245.678379]  #1: ffff888013caf470 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.679794] 2 locks held by fsstress/4451:
> > > > > [  245.680329]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write0
> > > > > [  245.681390]  #1: ffff888007542010 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.682734] 1 lock held by fsstress/4452:
> > > > > [  245.683243]  #0: ffff88800bb090e0 (&type->s_umount_key#31){++++}-{3:3}, at: 0
> > > > > [  245.684412] 1 lock held by fsstress/4453:
> > > > > [  245.684967]  #0: ffff88800bb090e0 (&type->s_umount_key#31){++++}-{3:3}, at: 0
> > > > > [  245.686109] 2 locks held by fsstress/4454:
> > > > > [  245.686674]  #0: ffff88800bb09448 (sb_writers#3){.+.+}-{0:0}, at: do_writev+0
> > > > > [  245.687718]  #1: ffff8880178461b0 (&sb->s_type->i_mutex_key#9){++++}-{3:3}, 0
> > > > > [  245.689137]
> > > > > [  245.689346] =============================================
> > > > > [  245.689346]
> > > > >
> > > >
> > > >
> > > > Hi Harshad:
> > > >
> > > > FWIW, I've applied Xin Yin's hang fix for fallocate when using fast_commit to a
> > > > 5.16-rc5 kernel and I'm still seeing generic/083 hang when running the adv
> > > > test case on the test appliance.  If anything, it actually hangs more quickly
> > > > and reliably - generally on the first or second trial in a sequence.
> > > >
> > > > Eric
> > > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ