[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <5CF89C35-09E2-4BF4-877D-7D82F33D6356@linaro.org>
Date: Wed, 21 Oct 2020 08:05:26 +0200
From: Paolo Valente <paolo.valente@...aro.org>
To: Jens Axboe <axboe@...nel.dk>
Cc: Mike Galbraith <efault@....de>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: block, bfq: lockdep circular locking dependency gripe
> Il giorno 20 ott 2020, alle ore 18:54, Jens Axboe <axboe@...nel.dk> ha scritto:
>
> On 10/20/20 1:15 AM, Paolo Valente wrote:
>>> Il giorno 20 ott 2020, alle ore 08:15, Mike Galbraith <efault@....de> ha scritto:
>>>
>>> [ 1917.361401] ======================================================
>>> [ 1917.361406] WARNING: possible circular locking dependency detected
>>> [ 1917.361413] 5.9.0.g7cf726a-master #2 Tainted: G S E
>>> [ 1917.361417] ------------------------------------------------------
>>> [ 1917.361422] kworker/u16:35/15995 is trying to acquire lock:
>>> [ 1917.361428] ffff89232237f7e0 (&ioc->lock){..-.}-{2:2}, at: put_io_context+0x30/0x90
>>> [ 1917.361440]
>>> but task is already holding lock:
>>> [ 1917.361445] ffff892244d2cc08 (&bfqd->lock){-.-.}-{2:2}, at: bfq_insert_requests+0x89/0x680
>>> [ 1917.361456]
>>> which lock already depends on the new lock.
>>>
>>> [ 1917.361463]
>>> the existing dependency chain (in reverse order) is:
>>> [ 1917.361469]
>>> -> #1 (&bfqd->lock){-.-.}-{2:2}:
>>> [ 1917.361479] _raw_spin_lock_irqsave+0x3d/0x50
>>> [ 1917.361484] bfq_exit_icq_bfqq+0x48/0x3f0
>>> [ 1917.361489] bfq_exit_icq+0x13/0x20
>>> [ 1917.361494] put_io_context_active+0x55/0x80
>>> [ 1917.361499] do_exit+0x72c/0xca0
>>> [ 1917.361504] do_group_exit+0x47/0xb0
>>> [ 1917.361508] __x64_sys_exit_group+0x14/0x20
>>> [ 1917.361513] do_syscall_64+0x33/0x40
>>> [ 1917.361518] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [ 1917.361523]
>>> -> #0 (&ioc->lock){..-.}-{2:2}:
>>> [ 1917.361532] __lock_acquire+0x149d/0x1a70
>>> [ 1917.361537] lock_acquire+0x1a7/0x3b0
>>> [ 1917.361542] _raw_spin_lock_irqsave+0x3d/0x50
>>> [ 1917.361547] put_io_context+0x30/0x90
>>> [ 1917.361552] blk_mq_free_request+0x4f/0x140
>>> [ 1917.361557] blk_attempt_req_merge+0x19/0x30
>>> [ 1917.361563] elv_attempt_insert_merge+0x4f/0x90
>>> [ 1917.361568] blk_mq_sched_try_insert_merge+0x28/0x40
>>> [ 1917.361574] bfq_insert_requests+0x94/0x680
>>> [ 1917.361579] blk_mq_sched_insert_requests+0xd1/0x2a0
>>> [ 1917.361584] blk_mq_flush_plug_list+0x12d/0x240
>>> [ 1917.361589] blk_flush_plug_list+0xb4/0xd0
>>> [ 1917.361594] io_schedule_prepare+0x3c/0x40
>>> [ 1917.361599] io_schedule+0xb/0x40
>>> [ 1917.361604] blk_mq_get_tag+0x13a/0x250
>>> [ 1917.361608] __blk_mq_alloc_request+0x5c/0x130
>>> [ 1917.361613] blk_mq_submit_bio+0xf3/0x770
>>> [ 1917.361618] submit_bio_noacct+0x41e/0x4b0
>>> [ 1917.361622] submit_bio+0x33/0x160
>>> [ 1917.361644] ext4_io_submit+0x49/0x60 [ext4]
>>> [ 1917.361661] ext4_writepages+0x683/0x1070 [ext4]
>>> [ 1917.361667] do_writepages+0x3c/0xe0
>>> [ 1917.361672] __writeback_single_inode+0x62/0x630
>>> [ 1917.361677] writeback_sb_inodes+0x218/0x4d0
>>> [ 1917.361681] __writeback_inodes_wb+0x5f/0xc0
>>> [ 1917.361686] wb_writeback+0x283/0x490
>>> [ 1917.361691] wb_workfn+0x29a/0x670
>>> [ 1917.361696] process_one_work+0x283/0x620
>>> [ 1917.361701] worker_thread+0x39/0x3f0
>>> [ 1917.361706] kthread+0x152/0x170
>>> [ 1917.361711] ret_from_fork+0x1f/0x30
>>> [ 1917.361715]
>>> other info that might help us debug this:
>>>
>>> [ 1917.361722] Possible unsafe locking scenario:
>>>
>>> [ 1917.361728] CPU0 CPU1
>>> [ 1917.361731] ---- ----
>>> [ 1917.361736] lock(&bfqd->lock);
>>> [ 1917.361740] lock(&ioc->lock);
>>> [ 1917.361746] lock(&bfqd->lock);
>>> [ 1917.361752] lock(&ioc->lock);
>>> [ 1917.361757]
>>> *** DEADLOCK ***
>>>
>>> [ 1917.361763] 5 locks held by kworker/u16:35/15995:
>>> [ 1917.361767] #0: ffff892240c9bd38 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x1fa/0x620
>>> [ 1917.361778] #1: ffff94569342fe78 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x1fa/0x620
>>> [ 1917.361789] #2: ffff8921424ae0e0 (&type->s_umount_key#39){++++}-{3:3}, at: trylock_super+0x16/0x50
>>> [ 1917.361800] #3: ffff8921424aaa40 (&sbi->s_writepages_rwsem){.+.+}-{0:0}, at: do_writepages+0x3c/0xe0
>>> [ 1917.361811] #4: ffff892244d2cc08 (&bfqd->lock){-.-.}-{2:2}, at: bfq_insert_requests+0x89/0x680
>>> [ 1917.361821]
>>> stack backtrace:
>>> [ 1917.361827] CPU: 6 PID: 15995 Comm: kworker/u16:35 Kdump: loaded Tainted: G S E 5.9.0.g7cf726a-master #2
>>> [ 1917.361833] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
>>> [ 1917.361840] Workqueue: writeback wb_workfn (flush-8:32)
>>> [ 1917.361846] Call Trace:
>>> [ 1917.361854] dump_stack+0x77/0x97
>>> [ 1917.361860] check_noncircular+0xe7/0x100
>>> [ 1917.361866] ? __lock_acquire+0x2ce/0x1a70
>>> [ 1917.361872] ? __lock_acquire+0x149d/0x1a70
>>> [ 1917.361877] __lock_acquire+0x149d/0x1a70
>>> [ 1917.361884] lock_acquire+0x1a7/0x3b0
>>> [ 1917.361889] ? put_io_context+0x30/0x90
>>> [ 1917.361894] ? bfq_put_queue+0xcf/0x480
>>> [ 1917.361901] _raw_spin_lock_irqsave+0x3d/0x50
>>> [ 1917.361906] ? put_io_context+0x30/0x90
>>> [ 1917.361911] put_io_context+0x30/0x90
>>> [ 1917.361916] blk_mq_free_request+0x4f/0x140
>>> [ 1917.361921] blk_attempt_req_merge+0x19/0x30
>>> [ 1917.361926] elv_attempt_insert_merge+0x4f/0x90
>>> [ 1917.361932] blk_mq_sched_try_insert_merge+0x28/0x40
>>> [ 1917.361937] bfq_insert_requests+0x94/0x680
>>> [ 1917.361944] blk_mq_sched_insert_requests+0xd1/0x2a0
>>> [ 1917.361950] blk_mq_flush_plug_list+0x12d/0x240
>>> [ 1917.361956] blk_flush_plug_list+0xb4/0xd0
>>> [ 1917.361962] io_schedule_prepare+0x3c/0x40
>>> [ 1917.361967] io_schedule+0xb/0x40
>>> [ 1917.361972] blk_mq_get_tag+0x13a/0x250
>>> [ 1917.361978] ? wait_woken+0xa0/0xa0
>>> [ 1917.361984] __blk_mq_alloc_request+0x5c/0x130
>>> [ 1917.361989] blk_mq_submit_bio+0xf3/0x770
>>> [ 1917.361996] submit_bio_noacct+0x41e/0x4b0
>>> [ 1917.362002] ? submit_bio+0x33/0x160
>>> [ 1917.362007] submit_bio+0x33/0x160
>>> [ 1917.362028] ext4_io_submit+0x49/0x60 [ext4]
>>> [ 1917.362045] ext4_writepages+0x683/0x1070 [ext4]
>>> [ 1917.362056] ? do_writepages+0x3c/0xe0
>>> [ 1917.362060] do_writepages+0x3c/0xe0
>>> [ 1917.362067] ? __writeback_single_inode+0x62/0x630
>>> [ 1917.362072] __writeback_single_inode+0x62/0x630
>>> [ 1917.362078] writeback_sb_inodes+0x218/0x4d0
>>> [ 1917.362087] __writeback_inodes_wb+0x5f/0xc0
>>> [ 1917.362093] wb_writeback+0x283/0x490
>>> [ 1917.362100] ? wb_workfn+0x29a/0x670
>>> [ 1917.362104] wb_workfn+0x29a/0x670
>>> [ 1917.362112] ? process_one_work+0x283/0x620
>>> [ 1917.362117] ? process_one_work+0x251/0x620
>>> [ 1917.362121] process_one_work+0x283/0x620
>>> [ 1917.362128] worker_thread+0x39/0x3f0
>>> [ 1917.362133] ? process_one_work+0x620/0x620
>>> [ 1917.362138] kthread+0x152/0x170
>>> [ 1917.362142] ? kthread_park+0x90/0x90
>>> [ 1917.362148] ret_from_fork+0x1f/0x30
>>
>> Hi,
>> that's apparently hard to solve inside bfq. The the ioc of the task is
>> being exited while the task is still inside the code for having an I/O
>> request served. Is still normal?
>
> (please don't top post...)
>
> First of all, never assume you have to work around what appears to be
> core issues in BFQ. That doesn't mean the problem is unsolvable, just
> that it might need solving outside of BFQ.
>
One of us is not seeing the core problem here; and for sure it's me. Upstream of the circular dependency addressed by your patch attempt, the problem I see is that the icq can be destroyed while other icq-based code is being executed.
In the exact case reported in this thread, it happens that we are in the middle of an insert_request. So, we may get to, e.g., this instruction in the middle of bfq_init_rq()
bic = icq_to_bic(rq->elv.icq);
but the icq in rq->elv.icq is already a pending pointer.
What am I missing?
Thanks,
Paolo
Powered by blists - more mailing lists