lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <784a3686-cb54-561d-740c-30e0b3f46df8@acm.org>
Date:   Mon, 8 Mar 2021 11:59:23 -0800
From:   Bart Van Assche <bvanassche@....org>
To:     John Garry <john.garry@...wei.com>, hare@...e.de,
        ming.lei@...hat.com, axboe@...nel.dk, hch@....de
Cc:     linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        pragalla@...eaurora.org, kashyap.desai@...adcom.com,
        yuyufen@...wei.com
Subject: Re: [RFC PATCH v3 3/3] blk-mq: Lockout tagset iterator when exiting
 elevator

On 3/8/21 3:17 AM, John Garry wrote:
> On 06/03/2021 04:43, Bart Van Assche wrote:
>> On 3/5/21 7:14 AM, John Garry wrote:
>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>> index 7ff1b20d58e7..5950fee490e8 100644
>>> --- a/block/blk-mq-tag.c
>>> +++ b/block/blk-mq-tag.c
>>> @@ -358,11 +358,16 @@ void blk_mq_tagset_busy_iter(struct 
>>> blk_mq_tag_set *tagset,
>>>   {
>>>       int i;
>>> +    if (!atomic_inc_not_zero(&tagset->iter_usage_counter))
>>> +        return;
>>> +
>>>       for (i = 0; i < tagset->nr_hw_queues; i++) {
>>>           if (tagset->tags && tagset->tags[i])
>>>               __blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
>>>                             BT_TAG_ITER_STARTED);
>>>       }
>>> +
>>> +    atomic_dec(&tagset->iter_usage_counter);
>>>   }
>>>   EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
> 
> Hi Bart,
> 
>> This changes the behavior of blk_mq_tagset_busy_iter(). What will e.g.
>> happen if the mtip driver calls blk_mq_tagset_busy_iter(&dd->tags,
>> mtip_abort_cmd, dd) concurrently with another blk_mq_tagset_busy_iter()
>> call and if that causes all mtip_abort_cmd() calls to be skipped?
> 
> I'm not sure that I understand this problem you describe. So if 
> blk_mq_tagset_busy_iter(&dd->tags, mtip_abort_cmd, dd) is called, either 
> can happen:
> a. normal operation, iter_usage_counter initially holds >= 1, and then 
> iter_usage_counter is incremented in blk_mq_tagset_busy_iter() and we 
> iter the busy tags. Any parallel call to blk_mq_tagset_busy_iter() will 
> also increase iter_usage_counter.
> b. we're switching IO scheduler. In this scenario, first we quiesce all 
> queues. After that, there should be no active requests. At that point, 
> we ensure any calls to blk_mq_tagset_busy_iter() are finished and block 
> (or discard may be a better term) any more calls. Blocking any more 
> calls should be safe as there are no requests to iter. atomic_cmpxchg() 
> is used to set iter_usage_counter to 0, blocking any more calls.


Hi John,

My concern is about the insertion of the early return statement in 
blk_mq_tagset_busy_iter(). Although most blk_mq_tagset_busy_iter() 
callers can handle skipping certain blk_mq_tagset_busy_iter() calls 
(e.g. when gathering statistics), I'm not sure this is safe for all 
blk_mq_tagset_busy_iter() callers. The example I cited is an example of 
a blk_mq_tagset_busy_iter() call with side effects.

The mtip driver allocates one tag set per request queue so quiescing 
queues should be sufficient to address my concern for the mtip driver.

The NVMe core and SCSI core however share a single tag set across 
multiple namespaces / LUNs. In the error path of nvme_rdma_setup_ctrl() 
I found a call to nvme_cancel_tagset(). nvme_cancel_tagset() calls 
blk_mq_tagset_busy_iter(ctrl->tagset, nvme_cancel_request, ctrl). I'm 
not sure it is safe to skip the nvme_cancel_request() calls if the I/O 
scheduler for another NVMe namespace is being modified.

Thanks,

Bart.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ