lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191030170246.GB693@sol.localdomain>
Date:   Wed, 30 Oct 2019 10:02:46 -0700
From:   Eric Biggers <ebiggers@...nel.org>
To:     Chao Yu <yuchao0@...wei.com>
Cc:     Jaegeuk Kim <jaegeuk@...nel.org>, linux-kernel@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net
Subject: Re: [PATCH 2/2] f2fs: support data compression

On Wed, Oct 30, 2019 at 04:43:52PM +0800, Chao Yu wrote:
> >>>>  static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
> >>>>  {
> >>>> -	/*
> >>>> -	 * We use different work queues for decryption and for verity because
> >>>> -	 * verity may require reading metadata pages that need decryption, and
> >>>> -	 * we shouldn't recurse to the same workqueue.
> >>>> -	 */
> >>>
> >>> Why is it okay (i.e., no deadlocks) to no longer use different work queues for
> >>> decryption and for verity?  See the comment above which is being deleted.
> >>
> >> Could you explain more about how deadlock happen? or share me a link address if
> >> you have described that case somewhere?
> >>
> > 
> > The verity work can read pages from the file which require decryption.  I'm
> > concerned that it could deadlock if the work is scheduled on the same workqueue.
> 
> I assume you've tried one workqueue, and suffered deadlock..
> 
> > Granted, I'm not an expert in Linux workqueues, so if you've investigated this
> > and determined that it's safe, can you explain why?
> 
> I'm not familiar with workqueue...  I guess it may not safe that if the work is
> scheduled to the same cpu in where verity was waiting for data? if the work is
> scheduled to other cpu, it may be safe.
> 
> I can check that before splitting the workqueue for verity and decrypt/decompress.
> 

Yes this is a real problem, try 'kvm-xfstests -c f2fs/encrypt generic/579'.
The worker thread gets deadlocked in f2fs_read_merkle_tree_page() waiting for
the Merkle tree page to be decrypted.  This is with the v2 compression patch;
it works fine on current mainline.

INFO: task kworker/u5:0:61 blocked for more than 30 seconds.
      Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u5:0    D    0    61      2 0x80004000
Workqueue: f2fs_post_read_wq f2fs_post_read_work
Call Trace:
 context_switch kernel/sched/core.c:3384 [inline]
 __schedule+0x299/0x6c0 kernel/sched/core.c:4069
 schedule+0x44/0xd0 kernel/sched/core.c:4136
 io_schedule+0x11/0x40 kernel/sched/core.c:5780
 wait_on_page_bit_common mm/filemap.c:1174 [inline]
 wait_on_page_bit mm/filemap.c:1223 [inline]
 wait_on_page_locked include/linux/pagemap.h:527 [inline]
 wait_on_page_locked include/linux/pagemap.h:524 [inline]
 wait_on_page_read mm/filemap.c:2767 [inline]
 do_read_cache_page+0x407/0x660 mm/filemap.c:2810
 read_cache_page+0xd/0x10 mm/filemap.c:2894
 f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
 verify_page+0x110/0x560 fs/verity/verify.c:120
 fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
 verity_work fs/f2fs/data.c:142 [inline]
 f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
 process_one_work+0x225/0x550 kernel/workqueue.c:2269
 worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
 kthread+0x125/0x140 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
INFO: task kworker/u5:1:1140 blocked for more than 30 seconds.
      Not tainted 5.4.0-rc1-00119-g464e31ba60d0 #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u5:1    D    0  1140      2 0x80004000
Workqueue: f2fs_post_read_wq f2fs_post_read_work
Call Trace:
 context_switch kernel/sched/core.c:3384 [inline]
 __schedule+0x299/0x6c0 kernel/sched/core.c:4069
 schedule+0x44/0xd0 kernel/sched/core.c:4136
 io_schedule+0x11/0x40 kernel/sched/core.c:5780
 wait_on_page_bit_common mm/filemap.c:1174 [inline]
 wait_on_page_bit mm/filemap.c:1223 [inline]
 wait_on_page_locked include/linux/pagemap.h:527 [inline]
 wait_on_page_locked include/linux/pagemap.h:524 [inline]
 wait_on_page_read mm/filemap.c:2767 [inline]
 do_read_cache_page+0x407/0x660 mm/filemap.c:2810
 read_cache_page+0xd/0x10 mm/filemap.c:2894
 f2fs_read_merkle_tree_page+0x2e/0x30 include/linux/pagemap.h:396
 verify_page+0x110/0x560 fs/verity/verify.c:120
 fsverity_verify_bio+0xe6/0x1a0 fs/verity/verify.c:239
 verity_work fs/f2fs/data.c:142 [inline]
 f2fs_post_read_work+0x36/0x50 fs/f2fs/data.c:160
 process_one_work+0x225/0x550 kernel/workqueue.c:2269
 worker_thread+0x4b/0x3c0 kernel/workqueue.c:2415
 kthread+0x125/0x140 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Showing all locks held in the system:
1 lock held by khungtaskd/21:
 #0: ffffffff82250520 (rcu_read_lock){....}, at: rcu_lock_acquire.constprop.0+0x0/0x30 include/trace/events/lock.h:13
2 locks held by kworker/u5:0/61:
 #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
 #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
 #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
 #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
 #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
 #1: ffffc90000253e50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
2 locks held by kworker/u5:1/1140:
 #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
 #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
 #0: ffff88807b78eb28 ((wq_completion)f2fs_post_read_wq){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240
 #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_data kernel/workqueue.c:619 [inline]
 #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
 #1: ffffc9000174be50 ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x1ad/0x550 kernel/workqueue.c:2240

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ