lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <f1aa6b0d-bb48-f0e3-f9e2-5b1199a9635d@kernel.org> Date: Thu, 3 Mar 2022 10:32:33 +0800 From: Chao Yu <chao@...nel.org> To: Jaegeuk Kim <jaegeuk@...nel.org> Cc: Jing Xia <jing.xia@...soc.com>, linux-f2fs-devel@...ts.sourceforge.net, Zhiguo Niu <zhiguo.niu@...soc.com>, linux-kernel@...r.kernel.org Subject: Re: [f2fs-dev] [PATCH] f2fs: fix to avoid potential deadlock On 2022/3/3 3:45, Jaegeuk Kim wrote: > On 03/02, Chao Yu wrote: >> On 2022/3/2 13:26, Jaegeuk Kim wrote: >>> On 03/02, Chao Yu wrote: >>>> ping, >>>> >>>> On 2022/2/25 11:02, Chao Yu wrote: >>>>> On 2022/2/3 22:57, Chao Yu wrote: >>>>>> On 2022/2/3 9:51, Jaegeuk Kim wrote: >>>>>>> On 01/29, Chao Yu wrote: >>>>>>>> On 2022/1/29 8:37, Jaegeuk Kim wrote: >>>>>>>>> On 01/28, Chao Yu wrote: >>>>>>>>>> On 2022/1/28 5:59, Jaegeuk Kim wrote: >>>>>>>>>>> On 01/27, Chao Yu wrote: >>>>>>>>>>>> Quoted from Jing Xia's report, there is a potential deadlock may happen >>>>>>>>>>>> between kworker and checkpoint as below: >>>>>>>>>>>> >>>>>>>>>>>> [T:writeback] [T:checkpoint] >>>>>>>>>>>> - wb_writeback >>>>>>>>>>>> - blk_start_plug >>>>>>>>>>>> bio contains NodeA was plugged in writeback threads >>>>>>>>>>> >>>>>>>>>>> I'm still trying to understand more precisely. So, how is it possible to >>>>>>>>>>> have bio having node write in this current context? >>>>>>>>>> >>>>>>>>>> IMO, after above blk_start_plug(), it may plug some inode's node page in kworker >>>>>>>>>> during writebacking node_inode's data page (which should be node page)? >>>>>>>>> >>>>>>>>> Wasn't that added into a different task->plug? >>>>>>>> >>>>>>>> I'm not sure I've got your concern correctly... >>>>>>>> >>>>>>>> Do you mean NodeA and other IOs from do_writepages() were plugged in >>>>>>>> different local plug variables? >>>>>>> >>>>>>> I think so. >>>>>> >>>>>> I guess block plug helper says it doesn't allow to use nested plug, so there >>>>>> is only one plug in kworker thread? >>> >>> Is there only one kworker thread that flushes node and inode pages? >> >> IIRC, =one kworker per block device? > > If there's one kworker only, f2fs_write_node_pages() should have flushed its > plug? No, f2fs_write_node_pages() failed to attach local plug into current->plug due to current has attached plug from wb_writeback(), and also, f2fs_write_node_pages() will fail to flush current->plug due to its local plug doesn't match current->plug. void blk_start_plug_nr_ios() { if (tsk->plug) return; ... } void blk_finish_plug(struct blk_plug *plug) { if (plug == current->plug) { __blk_flush_plug(plug, false); current->plug = NULL; } } Thanks, > >> >> Thanks, >> >>> >>>>>> >>>>>> void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios) >>>>>> { >>>>>> struct task_struct *tsk = current; >>>>>> >>>>>> /* >>>>>> * If this is a nested plug, don't actually assign it. >>>>>> */ >>>>>> if (tsk->plug) >>>>>> return; >>>>>> ... >>>>>> } >>>>> >>>>> Any further comments? >>>>> >>>>> Thanks, >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> - do_writepages -- sync write inodeB, inc wb_sync_req[DATA] >>>>>>>>>>>> - f2fs_write_data_pages >>>>>>>>>>>> - f2fs_write_single_data_page -- write last dirty page >>>>>>>>>>>> - f2fs_do_write_data_page >>>>>>>>>>>> - set_page_writeback -- clear page dirty flag and >>>>>>>>>>>> PAGECACHE_TAG_DIRTY tag in radix tree >>>>>>>>>>>> - f2fs_outplace_write_data >>>>>>>>>>>> - f2fs_update_data_blkaddr >>>>>>>>>>>> - f2fs_wait_on_page_writeback -- wait NodeA to writeback here >>>>>>>>>>>> - inode_dec_dirty_pages >>>>>>>>>>>> - writeback_sb_inodes >>>>>>>>>>>> - writeback_single_inode >>>>>>>>>>>> - do_writepages >>>>>>>>>>>> - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA] >>>>>>>>>>>> - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one >>>>>>>>>>>> - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped >>>>>>>>>>>> - blk_finish_plug >>>>>>>>>>>> >>>>>>>>>>>> Let's try to avoid deadlock condition by forcing unplugging previous bio via >>>>>>>>>>>> blk_finish_plug(current->plug) once we'v skipped writeback in writepages() >>>>>>>>>>>> due to valid sbi->wb_sync_req[DATA/NODE]. >>>>>>>>>>>> >>>>>>>>>>>> Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE") >>>>>>>>>>>> Signed-off-by: Zhiguo Niu <zhiguo.niu@...soc.com> >>>>>>>>>>>> Signed-off-by: Jing Xia <jing.xia@...soc.com> >>>>>>>>>>>> Signed-off-by: Chao Yu <chao@...nel.org> >>>>>>>>>>>> --- >>>>>>>>>>>> fs/f2fs/data.c | 6 +++++- >>>>>>>>>>>> fs/f2fs/node.c | 6 +++++- >>>>>>>>>>>> 2 files changed, 10 insertions(+), 2 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>>>>>>>>>> index 76d6fe7b0c8f..932a4c81acaf 100644 >>>>>>>>>>>> --- a/fs/f2fs/data.c >>>>>>>>>>>> +++ b/fs/f2fs/data.c >>>>>>>>>>>> @@ -3174,8 +3174,12 @@ static int __f2fs_write_data_pages(struct address_space *mapping, >>>>>>>>>>>> /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */ >>>>>>>>>>>> if (wbc->sync_mode == WB_SYNC_ALL) >>>>>>>>>>>> atomic_inc(&sbi->wb_sync_req[DATA]); >>>>>>>>>>>> - else if (atomic_read(&sbi->wb_sync_req[DATA])) >>>>>>>>>>>> + else if (atomic_read(&sbi->wb_sync_req[DATA])) { >>>>>>>>>>>> + /* to avoid potential deadlock */ >>>>>>>>>>>> + if (current->plug) >>>>>>>>>>>> + blk_finish_plug(current->plug); >>>>>>>>>>>> goto skip_write; >>>>>>>>>>>> + } >>>>>>>>>>>> if (__should_serialize_io(inode, wbc)) { >>>>>>>>>>>> mutex_lock(&sbi->writepages); >>>>>>>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >>>>>>>>>>>> index 556fcd8457f3..69c6bcaf5aae 100644 >>>>>>>>>>>> --- a/fs/f2fs/node.c >>>>>>>>>>>> +++ b/fs/f2fs/node.c >>>>>>>>>>>> @@ -2106,8 +2106,12 @@ static int f2fs_write_node_pages(struct address_space *mapping, >>>>>>>>>>>> if (wbc->sync_mode == WB_SYNC_ALL) >>>>>>>>>>>> atomic_inc(&sbi->wb_sync_req[NODE]); >>>>>>>>>>>> - else if (atomic_read(&sbi->wb_sync_req[NODE])) >>>>>>>>>>>> + else if (atomic_read(&sbi->wb_sync_req[NODE])) { >>>>>>>>>>>> + /* to avoid potential deadlock */ >>>>>>>>>>>> + if (current->plug) >>>>>>>>>>>> + blk_finish_plug(current->plug); >>>>>>>>>>>> goto skip_write; >>>>>>>>>>>> + } >>>>>>>>>>>> trace_f2fs_writepages(mapping->host, wbc, NODE); >>>>>>>>>>>> -- >>>>>>>>>>>> 2.32.0 >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Linux-f2fs-devel mailing list >>>>>> Linux-f2fs-devel@...ts.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> Linux-f2fs-devel mailing list >>>>> Linux-f2fs-devel@...ts.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Powered by blists - more mailing lists