lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR21MB3025781A702070304BB8A282D7A09@PH0PR21MB3025.namprd21.prod.outlook.com>
Date:   Sat, 4 Jun 2022 14:28:11 +0000
From:   "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To:     Keith Busch <kbusch@...nel.org>
CC:     "axboe@...com" <axboe@...com>, "hch@....de" <hch@....de>,
        "sagi@...mberg.me" <sagi@...mberg.me>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Caroline Subramoney <Caroline.Subramoney@...rosoft.com>,
        Richard Wurdack <riwurd@...rosoft.com>,
        Nathan Obr <Nathan.Obr@...rosoft.com>
Subject: RE: [PATCH v2 2/2] nvme: handle persistent internal error AER from
 NVMe controller

From: Keith Busch <kbusch@...nel.org> Sent: Friday, June 3, 2022 12:23 PM
> 
> On Fri, Jun 03, 2022 at 10:56:01AM -0700, Michael Kelley wrote:
> 
> This series looks good to me. Just one concern below that may amount to
> nothing.
> 
> > +static void nvme_handle_aer_persistent_error(struct nvme_ctrl *ctrl)
> > +{
> > +	u32 csts;
> > +
> > +	trace_nvme_async_event(ctrl, NVME_AER_ERROR);
> > +
> > +	if (ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &csts) != 0 ||
> 
> The reg_read32() is non-blocking for pcie, so this is safe to call from that
> driver's irq handler. The other transports block on register reads, though, so
> they can't call this from an atomic context. The TCP context looks safe, but
> I'm not sure about RDMA or FC.

Good point.  But even if the RDMA and FC contexts are safe, if a
persistent error is reported, the controller is already in trouble and
may not respond to a request to retrieve the CSTS anyway.  Perhaps
we should just trust the AER error report and not bother checking
CSTS to decide whether to do the reset.  We can still check ctrl->state
and skip the reset if there's already one in progress.

> 
> > +	    nvme_should_reset(ctrl, csts)) {
> > +		dev_warn(ctrl->device, "resetting controller due to AER\n");
> > +		nvme_reset_ctrl(ctrl);
> > +	}
> > +}
> > +
> >  void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
> >  		volatile union nvme_result *res)
> >  {
> >  	u32 result = le32_to_cpu(res->u32);
> >  	u32 aer_type = result & 0x07;
> > +	u32 aer_subtype = (result & 0xff00) >> 8;
> 
> Since the above mask + shift is duplicated with nvme_handle_aen_notice(), an
> inline helper function seems reasonable.

Yep.  Will do in v3.

Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ