lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170315162158.GA18768@ming.t460p>
Date:   Thu, 16 Mar 2017 00:22:03 +0800
From:   Ming Lei <tom.leiming@...il.com>
To:     Bart Van Assche <Bart.VanAssche@...disk.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "hch@...radead.org" <hch@...radead.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "yizhan@...hat.com" <yizhan@...hat.com>,
        "axboe@...com" <axboe@...com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH 1/2] blk-mq: don't complete un-started request in timeout
 handler

On Wed, Mar 15, 2017 at 03:36:31PM +0000, Bart Van Assche wrote:
> On Wed, 2017-03-15 at 20:40 +0800, Ming Lei wrote:
> > On Wed, Mar 15, 2017 at 08:18:53PM +0800, Ming Lei wrote:
> > > On Wed, Mar 15, 2017 at 12:07:37AM +0000, Bart Van Assche wrote:
> > > 
> > > > or __blk_mq_requeue_request(). Another issue with this function is that the
> > > 
> > > __blk_mq_requeue_request() can be run from two pathes:
> > > 
> > > 	- dispatch failure, in which case the req/tag isn't released to tag set
> > > 	
> > > 	- IO completion path, in which COMPLETE flag is cleared before requeue.
> > > 	
> > > so I can't see races with timeout in case of start rq vs. requeue rq. 
> > 
> > Actually rq/tag won't be released to tag set if it will be requeued, so
> > the timeout race is nothing to do with requeue.
> 
> Hello Ming,
> 
> Please have another look at __blk_mq_requeue_request(). In that function
> the following code occurs: if (test_and_clear_bit(REQ_ATOM_STARTED,
> &rq->atomic_flags)) { ... }
> 
> I think the REQ_ATOM_STARTED check in blk_mq_check_expired() races with the
> test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags) call in
> __blk_mq_requeue_request().

OK, this race should only exist in case that the requeue happens after dispatch
busy, because COMPLETE flag isn't set. And if the requeue is from io completion,
no such race because COMPLETE flag is set.

One solution I thought of is to call blk_mark_rq_complete() before requeuing
when dispatch busy happened, but that looks a bit silly. Another way is to
set STARTED flag just after .queue_rq returns BLK_MQ_RQ_QUEUE_OK, which looks
reasonable too. Any comments on the 2nd solution?


Thanks,
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ