linux-kernel - RE: [PATCH v3 2/2] nvme: handle persistent internal error AER from NVMe controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <PH0PR21MB3025187DD97DA485E6931793D7A49@PH0PR21MB3025.namprd21.prod.outlook.com>
Date:   Wed, 8 Jun 2022 03:59:00 +0000
From:   "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To:     Christoph Hellwig <hch@....de>
CC:     "kbusch@...nel.org" <kbusch@...nel.org>,
        "axboe@...com" <axboe@...com>,
        "sagi@...mberg.me" <sagi@...mberg.me>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Caroline Subramoney <Caroline.Subramoney@...rosoft.com>,
        Richard Wurdack <riwurd@...rosoft.com>,
        Nathan Obr <Nathan.Obr@...rosoft.com>
Subject: RE: [PATCH v3 2/2] nvme: handle persistent internal error AER from
 NVMe controller

From: Christoph Hellwig <hch@....de> Sent: Tuesday, June 7, 2022 3:36 AM
> 
> On Mon, Jun 06, 2022 at 05:15:15PM -0700, Michael Kelley wrote:
> > +static void nvme_handle_aer_persistent_error(struct nvme_ctrl *ctrl)
> > +{
> > +	trace_nvme_async_event(ctrl, NVME_AER_ERROR);
> > +
> > +	/*
> > +	 * We can't read the CSTS here because we're in an atomic context on
> > +	 * some transports and the read may require submitting a request to the
> > +	 * to the controller and getting a response. Such a sequence isn't
> > +	 * likely to be successful anyway if the controller is reporting a
> > +	 * persistent internal error. So assume CSTS.CFS is set.
> > +	 */
> > +	if (nvme_should_reset(ctrl, NVME_CSTS_CFS)) {
> > +		dev_warn(ctrl->device, "resetting controller due to AER\n");
> > +		nvme_reset_ctrl(ctrl);
> 
> I don't think we even need the nvme_should_reset check now.
> 
> nvme_reset_ctrl first calls nvme_change_ctrl_state, which only allows
> the transition to the RESETTING state if it previously was NEW or LIVE,
> so we are already covered.  The only downside would be an extra kernel
> message if we already were in another state.

OK, I agree.  Patch 1/2 can be dropped since there's now no need to
move nvme_should_reset(), and patch 2 is simplified even further.

I'll do a v4.

Michael