[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YMggg+0mVwA0Gl4j@T590>
Date: Tue, 15 Jun 2021 11:37:39 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Daniel Wagner <dwagner@...e.de>
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH] blk-mq: Do not lookup ctx with invalid index
On Mon, Jun 14, 2021 at 01:37:06PM +0200, Daniel Wagner wrote:
> On Tue, Jun 08, 2021 at 08:33:39PM +0200, Daniel Wagner wrote:
> > cpumask_first_and() returns >= nr_cpu_ids if the two provided masks do
> > not share a common bit. Verify we get a valid value back from
> > cpumask_first_and().
>
> So I got feedback on this issue (but not on the patch itself yet). The
> system starts with 16 virtual CPU cores and during the test 4 cores are
> removed[1] and as soon there is an error on the storage side, the reset
> code on the host ends up in this path and crashes. I still don't
> understand why the CPU removal is not updating the CPU mask correctly
> before we hit the reset path. I'll continue to investigate.
We don't update hctx->cpumask when CPU is added/removed, and that is
assigned against cpu_possible_mask from beginning.
It is one long-term issue, which can be triggered when all cpus in
hctx->cpumask become offline. The thing is that only nvmf_connect_io_queue()
allocates request via specified hctx.
thanks,
Ming
Powered by blists - more mailing lists