[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210629115027.rtohoxtl7cmycdqr@beryllium.lan>
Date: Tue, 29 Jun 2021 13:50:27 +0200
From: Daniel Wagner <dwagner@...e.de>
To: Ming Lei <ming.lei@...hat.com>
Cc: Wen Xiong <wenxiong@...ibm.com>, james.smart@...adcom.com,
linux-kernel@...r.kernel.org, sagi@...mberg.me,
wenxiong@...ux.vnet.ibm.com
Subject: Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing
port
On Tue, Jun 29, 2021 at 06:06:21PM +0800, Ming Lei wrote:
> > No, I don't see any errors. I am still trying to reproduce it on real
> > hardware. The setup with blktests running in Qemu did work with all
> > patches applied (the once from me and your patches).
> >
> > About the error argument: Later in the code path, e.g. in
> > __nvme_submit_sync_cmd() transport errors (incl. canceled request) are
> > handled as well, hence the upper layer will see errors during connection
> > attempts. My point is, there is nothing special about the connection
> > attempt failing. We have error handling code in place and the above
> > state machine has to deal with it.
>
> My two patches not only avoids the kernel panic, but also allow
> request to be allocated successfully, then connect io queue request can
> be submitted to driver even though all CPUs in hctx->cpumask is offline,
> then nvmef can be setup well.
>
> That is the difference with yours to fail the request allocation, then
> connect io queues can't be done, and the whole host can't be setup
> successfully, then become a brick. The point is that cpu offline shouldn't
> fail to setup nvme fc/rdma/tcp/loop.
Right, I think I see your point now.
> > Anyway, avoiding the if in the hotpath is a good thing. I just don't
> > think your argument about no error can happen is correct.
>
> Again, it isn't related with avoiding the if, and it isn't in hotpath
> at all.
I mixed up blk_mq_alloc_request() with blk_mq_alloc_request_hctx().
Thanks for the explanation. I'll keep trying to replicated the problem
on real hardware and see if these patches mitigate it.
Thanks,
Daniel
Powered by blists - more mailing lists