linux-kernel - Re: [PATCH 0/2] Handle update hardware queues and queue freeze more carefully

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFj5m9J9NL9qHjo1X9=PdE1-Nkgj2zV-ifdZ9aqqts2QNUpf8w@mail.gmail.com>
Date:   Tue, 29 Jun 2021 09:31:03 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Daniel Wagner <dwagner@...e.de>
Cc:     linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
        James Smart <james.smart@...adcom.com>,
        Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
        Sagi Grimberg <sagi@...mberg.me>
Subject: Re: [PATCH 0/2] Handle update hardware queues and queue freeze more carefully

On Fri, Jun 25, 2021 at 9:00 PM Ming Lei <ming.lei@...hat.com> wrote:
>
> On Fri, Jun 25, 2021 at 02:21:56PM +0200, Daniel Wagner wrote:
> > On Fri, Jun 25, 2021 at 12:16:47PM +0200, Daniel Wagner wrote:
> > > this is a followup on the crash I reported in
> > >
> > >   https://lore.kernel.org/linux-block/20210608183339.70609-1-dwagner@suse.de/
> > >
> > > By moving the hardware check up the crash was gone. Unfortuntatly, I
> > > don't understand why this fixes the crash. The per-cpu access is
> > > crashing but I can't see why the blk_mq_update_nr_hw_queues() is
> > > fixing this problem.
> > >
> > > Even though I can't explain why it fixes it, I think it makes sense to
> > > update the hardware queue mapping bevore we recreate the IO
> > > queues. Thus I avoided in the commit message to say it fixes
> > > something.
> >
> > I just discussed this with Hannes and we figured out how the crash is
> > fixed by moving the blk_mq_update_nr_hw_queues() before the
> > nvme_fc_create_hw_io_queues()/nvme_fc_connect_io_queues().
> >
> > First of all, blk_mq_update_nr_hw_queues() operates on the normal
> > tag_set and not the admin_tag_set. That means when we move the
> > blk_mq_update_nr_hw_queues() before the nvme_fc_connect_io_queues(), we
> > update the mapping to only CPUs and hwctx which are available. When we
> > then do the connect call nvmf_connect_io_queue() we will only allocate
> > tags from queues which are not in the BLK_MQ_S_INACTIVE anymore. Hence
> > we skip the blk_mq_put_tag() call.
>
> Your patch just reduces the race window, what if all cpus in
> hctx->cpumask become offline when calling blk_mq_alloc_request_hctx()?

connect io queues after updating nr_hw_queues can cause correct hctx_idx
to be passed to blk_mq_alloc_request_hctx(), so this patch is good, so the patch
looks good.

Yeah, there is still other issue not covered during cpu hotplug, but
that is different
with this one.

Thanks,