[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.1505221453230.15930@localhost.lm.intel.com>
Date: Fri, 22 May 2015 15:11:44 +0000 (UTC)
From: Keith Busch <keith.busch@...el.com>
To: Parav Pandit <parav.pandit@...gotech.com>
cc: Keith Busch <keith.busch@...el.com>,
linux-nvme@...ts.infradead.org,
Matthew Wilcox <willy@...ux.intel.com>,
Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] NVMe: Avoid interrupt disable during queue init.
On Fri, 22 May 2015, Parav Pandit wrote:
> On Fri, May 22, 2015 at 8:18 PM, Keith Busch <keith.busch@...el.com> wrote:
>> The rcu protection on nvme queues was removed with the blk-mq conversion
>> as we rely on that layer for h/w access.
>
> o.k. But above is at level where data I/Os are not even active. Its
> between nvme_kthread and nvme_resume() from power management
> subsystem.
> I must be missing something.
On resume, everything is already reaped from the queues, so there should
be no harm letting the kthread poll an inactive queue. The proposal to
remove the q_lock during queue init makes it possible for the thread to
see the wrong cq phase bit and mess up the completion queue's head from
reaping non-existent entries.
But beyond nvme_resume, it appears a race condition is possible on any
scenario when a device is reinitialized if it cannot create the same
number of IO queues as it had in originally. Part of the problem is there
doesn't seem to be a way to change a tagset's nr_hw_queues after it was
created. The conditions that leads to this scenario should be uncommon,
so I haven't given it much thought; I need to untangle dynamic namespaces
first. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists