linux-kernel - Re: [PATCH 0/2] nvme-fc: fix schedule in atomic context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fn3jxqr4gtcmgqp6l6vpveyhzj2z7qfhvwygurvptpdks46qze@xjlk3lbzfvkv>
Date: Thu, 20 Feb 2025 12:50:33 +0000
From: Shinichiro Kawasaki <shinichiro.kawasaki@....com>
To: Daniel Wagner <wagi@...nel.org>
CC: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>, hch
	<hch@....de>, Sagi Grimberg <sagi@...mberg.me>, James Smart
	<james.smart@...adcom.com>, Hannes Reinecke <hare@...e.de>,
	"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/2] nvme-fc: fix schedule in atomic context

On Feb 14, 2025 / 09:02, Daniel Wagner wrote:
> Shinichiro reported [1] the recent change in the nvme-fc setup path [2]
> introduced a bug. I didn't spot the schedule call in
> nvme_change_ctrl_state.
> 
> It turns out the locking is not necessary if we make the state machine a
> bit more restrictive and only allow entering the LIVE state from
> CONNECTING. If we do this, it's possible to ensure we either enter LIVE
> only if there was no connection loss event. Also the connection loss
> event handler should always trigger the reset handler to avoid a
> read-write race on the state machine state variable.
> 
> I've tried to replicate the original problem once again and wrote a new
> blktest which tries to trigger the race condition. I let it run a for a
> while and nothing broke, but I can't be sure it is really gone. The rest
> of the blktests also passed. Unfortunatly, the test box with FC hardware
> is currently not working, so I can't test this with real hardware.
> 
> [1] https://lore.kernel.org/all/denqwui6sl5erqmz2gvrwueyxakl5txzbbiu3fgebryzrfxunm@iwxuthct377m/
> [2] https://lore.kernel.org/all/20250109-nvme-fc-handle-com-lost-v4-3-fe5cae17b492@kernel.org/
> 
> Signed-off-by: Daniel Wagner <wagi@...nel.org>

Thanks. I reconfirmed that this series avoids the failure I reported [1]. Also I
ran all nvme test cases with various transports and observed no regression.

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@....com>