linux-kernel - Re: [PATCH V3] block: Mitigate lock unbalance caused by lock switching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120529014128.GF20954@dhcp-172-17-108-109.mtv.corp.google.com>
Date:	Tue, 29 May 2012 10:41:28 +0900
From:	Tejun Heo <tj@...nel.org>
To:	Asias He <asias@...hat.com>
Cc:	Jens Axboe <axboe@...nel.dk>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH V3] block: Mitigate lock unbalance caused by lock
 switching

On Tue, May 29, 2012 at 09:39:01AM +0800, Asias He wrote:
> Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
> supplied queue_lock before blk_drain_queue(). Switching the lock would
> introduce lock unbalance because theads which have taken the external
> lock might unlock the internal lock in the during the queue drain. This
> patch mitigate this by disconnecting the lock after the queue draining
> since queue draining makes a lot of request_queue users go away.
> 
> However, please note, this patch only makes the problem less likely to
> happen. Anyone who still holds a ref might try to issue a new request on
> a dead queue after the blk_cleanup_queue() finishes draining, the lock
> unbalance might still happen in this case.
> 
>  =====================================
>  [ BUG: bad unlock balance detected! ]
>  3.4.0+ #288 Not tainted
>  -------------------------------------
>  fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at:
>  [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
>  but there are no more locks to release!
> 
>  other info that might help us debug this:
>  1 lock held by fio/17706:
>   #0:  (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>]
>  get_request_wait+0x19a/0x250
> 
>  stack backtrace:
>  Pid: 17706, comm: fio Not tainted 3.4.0+ #288
>  Call Trace:
>   [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
>   [<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100
>   [<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330
>   [<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0
>   [<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0
>   [<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0
>   [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
>   [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
>   [<ffffffff810e0079>] lock_release+0xd9/0x250
>   [<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40
>   [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
>   [<ffffffff81328faa>] generic_make_request+0xca/0x100
>   [<ffffffff81329056>] submit_bio+0x76/0xf0
>   [<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60
>   [<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70
>   [<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0
>   [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
>   [<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60
>   [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
>   [<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60
>   [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
>   [<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760
>   [<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0
>   [<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0
>   [<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0
>   [<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0
>   [<ffffffff811e8250>] ? aio_fsync+0x30/0x30
>   [<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0
>   [<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80
>   [<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>   [<ffffffff811eae50>] sys_io_submit+0x10/0x20
>   [<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b
> 
> Changes since v2: Update commit log to explain how the code is still
>                   broken even if we delay the lock switching after the drain.
> Changes since v1: Update commit log as Tejun suggested.
> 
> Signed-off-by: Asias He <asias@...hat.com>

Acked-by: Tejun Heo <tj@...nel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/