lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230601151134.00006281@Huawei.com>
Date:   Thu, 1 Jun 2023 15:11:34 +0100
From:   Jonathan Cameron <Jonathan.Cameron@...wei.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
CC:     Robert Richter <rrichter@....com>,
        Terry Bowman <terry.bowman@....com>,
        <alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
        <ira.weiny@...el.com>, <bwidawsk@...nel.org>,
        <dan.j.williams@...el.com>, <dave.jiang@...el.com>,
        <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <bhelgaas@...gle.com>
Subject: Re: [PATCH v4 23/23] PCI/AER: Unmask RCEC internal errors to enable
 RCH downstream port error handling


> > > > @@ -1432,6 +1495,7 @@ static int aer_probe(struct pcie_device *dev)
> > > >  		return status;
> > > >  	}
> > > >  
> > > > +	cxl_rch_enable_rcec(port);  
> > > 
> > > Could this be done by the driver that claims the CXL RCiEP?  There's
> > > no point in unmasking the errors before there's a driver with
> > > pci_error_handlers that can do something with them anyway.  
> > 
> > This sounds reasonable at the first glance. The problem is there could
> > be many devices associated with the RCEC. Not all of them will be
> > bound to a driver and handler at the same time. We would need to
> > refcount it or maintain a list of enabled devices. But there is
> > already something similar by checking dev->driver. But right, AER
> > errors could be seen and handled then at least on PCI level. I tent to
> > permanently enable RCEC AER, but that could cause side-effects. What
> > do you think?  
> 
> IIUC, this really just affects CXL devices, so I think the choice is
> (1) always unmask internal errors for RCECs where those CXL devices
> report errors (as this patch does), or (2) unmask when first CXL
> driver that can handle the errors is loaded and restore previous state
> when last one is unloaded.
> 
> If the RCEC *only* handles errors for CXL devices, i.e., not for a mix
> of vanilla PCIe RCiEPs and CXL RCiEPs, I think I'm OK with (1).  I
> think you said only the CXL driver knows how to collect and interpret
> the error data.  Is it OK that when no such driver is loaded, we field
> error interrupts silently, without even mentioning that an error
> occurred?  I guess without the driver, the device is probably not in
> use.

It might be in use.  Firmware may well have set up the CXL device and
even have put the kernel image in that memory for example. OS first RAS
handling won't be up until the driver loads though.  Would be a bit
odd to mix OS first handling with firmware setup. I'd expect firmware
first handling in that case, but I don't think anything stops the two
being mixed.

Jonathan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ