lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 15 Aug 2013 19:19:29 -0700
From:	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>
To:	Alexander Gordeev <agordeev@...hat.com>
Cc:	Mike Christie <mchristie@...ionio.com>,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
	Jeff Garzik <jgarzik@...ox.com>,
	linux-scsi <linux-scsi@...r.kernel.org>
Subject: Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

On Thu, 2013-08-15 at 18:23 +0200, Alexander Gordeev wrote:
> On Fri, Aug 09, 2013 at 01:17:37PM -0700, Nicholas A. Bellinger wrote:
> > On Fri, 2013-08-09 at 21:15 +0200, Alexander Gordeev wrote:
> > Mmmm, I'm able to reproduce over here with ahci + scsi-mq, and it
> > appears to be a bug related with using sdev->sdev_md_req.queue_depth=1,
> > that ends up causing the blkdev_issue_flush() to wait forever because
> > blk_mq_wait_for_tags() never ends up getting the single tag back for the
> > WRITE_FLUSH bio -> SYNCHRONIZE_CACHE cdb.
> 
> It turns out this way - blkdev_issue_flush() claims the only tag, submits
> the bio and waits for the completion. But because blk_mq_make_request()
> does not mark any context in blk_mq_hw_ctx::ctx_map (nor enslists the request
> into blk_mq_ctx::rq_list) it never gets processed from blk_mq_work_fn->
> __blk_mq_run_hw_queue() and blkdev_issue_flush() waits endlessly. All other
> requests are just waiting for the tag availability as result.
> 

Ok, here's a bit better idea of what is going on now..

The problem is that blkdev_issue_flush() -> blk_mq_make_request() ->
__blk_mq_alloc_request() allocates the first tag, which calls
blk_insert_flush() -> blk_flush_complete_seq() -> blk_flush_kick() ->
mq_flush_work() -> blk_mq_alloc_request() to allocate a second tag for
the struct request that actually gets dispatched into scsi-mq as a
SYCHRONIZE_CACHE command..

I'm not exactly sure why this double tag usage of struct request is
occurring, but AFAICT it does happen for every flush, and is not
specific to the blkdev_issue_flush() codepath..  I'm sure that Jens can
fill us in on that bit.  ;)

So, assuming that this double tag usage is necessary and not a bug,
perhaps using a reserved tag for the first tag (eg: the one that's not
dispatched into scsi_mq_queue_rq) makes sense..?

I'm playing with a patch to do this, but am currently getting hung-up on
what appear to be some separate blk-mq reserved_tags > 0 bugs, the first
of which is passing queue_depth=1 + reserved_tags=1 is broken, and
results in tags->nr_free = 0.

Here's the quick fix:

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 6718007..ffdf686 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -470,7 +470,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
 	 * Rest of the tags start at the queue list
 	 */
 	tags->nr_free = 0;
-	while (nr_tags - tags->nr_reserved) {
+	while (nr_tags) {
 		tags->freelist[tags->nr_free] = tags->nr_free +
 							tags->nr_reserved;
 		nr_tags--;

Anyways, before digging further into reserved tags logic, Jens, what are
your thoughts for addressing this special queue_depth=1 case for libata
+ the like..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ