linux-kernel - Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180119170519.GF14827@ming.t460p>
Date:   Sat, 20 Jan 2018 01:05:20 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     Mike Snitzer <snitzer@...hat.com>,
        Bart Van Assche <Bart.VanAssche@....com>,
        "dm-devel@...hat.com" <dm-devel@...hat.com>,
        "hch@...radead.org" <hch@...radead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "osandov@...com" <osandov@...com>
Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

On Fri, Jan 19, 2018 at 09:52:32AM -0700, Jens Axboe wrote:
> On 1/19/18 9:47 AM, Mike Snitzer wrote:
> > On Fri, Jan 19 2018 at 11:41am -0500,
> > Jens Axboe <axboe@...nel.dk> wrote:
> > 
> >> On 1/19/18 9:37 AM, Ming Lei wrote:
> >>> On Fri, Jan 19, 2018 at 09:27:46AM -0700, Jens Axboe wrote:
> >>>> On 1/19/18 9:26 AM, Ming Lei wrote:
> >>>>> On Fri, Jan 19, 2018 at 09:19:24AM -0700, Jens Axboe wrote:
> >>>>
> >>>> There are no pending requests for this case, nothing to restart the
> >>>> queue. When you fail that blk_get_request(), you are idle, nothing
> >>>> is pending.
> >>>
> >>> I think we needn't worry about that, once a device is attached to
> >>> dm-rq, it can't be mounted any more, and usually user don't use the device
> >>> directly and by dm-mpath at the same time.
> >>
> >> Even if it doesn't happen for a normal dm setup, it is a case that
> >> needs to be handled. The request allocation is just one example of
> >> a wider scope resource that can be unavailable. If the driver returns
> >> NO_DEV_RESOURCE (or whatever name), it will be a possibility that
> >> the device itself is currently idle.
> > 
> > How would a driver's resources be exhausted yet the device is idle (so
> > as not to be able to benefit from RESTART)?
> 
> I've outlined a number of these examples already. Another case might be:
> 
> 1) Device is idle
> 2) Device gets request
> 3) Device attempts to DMA map
> 4) DMA map fails because the IOMMU is out of space (nic is using it all)
> 5) Device returns STS_RESOURCE
> 6) Queue is marked as needing a restart
> 
> All's well, except there is no IO on this device that will notice the
> restart bit and retry the operation.
> 
> Replace IOMMU failure with any other resource that the driver might need
> for an IO, which isn't tied to a device specific resource.
> blk_get_request() on dm is an example, as is any allocation failure
> occurring in the queue IO path for the driver.

Yeah, if we decide to take the approach of introducing NO_DEV_RESOURCE, all
the current STS_RESOURCE for non-device resource allocation(kmalloc, dma
map, get_request, ...) should be converted to NO_DEV_RESOURCE.

And it is a generic issue, which need generic solution.

Seems running queue after arbitrary in this case is the only way we
thought of, or other solutions?

If the decision is made, let's make/review the patch, :-)

-- 
Ming