lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67ad27da2f65c_2d2c294b9@dwillia2-xfh.jf.intel.com.notmuch>
Date: Wed, 12 Feb 2025 14:59:38 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Terry Bowman <terry.bowman@....com>, <linux-cxl@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
	<nifan.cxl@...il.com>, <dave@...olabs.net>, <jonathan.cameron@...wei.com>,
	<dave.jiang@...el.com>, <alison.schofield@...el.com>,
	<vishal.l.verma@...el.com>, <dan.j.williams@...el.com>,
	<bhelgaas@...gle.com>, <mahesh@...ux.ibm.com>, <ira.weiny@...el.com>,
	<oohall@...il.com>, <Benjamin.Cheatham@....com>, <rrichter@....com>,
	<nathan.fontenot@....com>, <Smita.KoralahalliChannabasappa@....com>,
	<lukas@...ner.de>, <ming.li@...omail.com>,
	<PradeepVineshReddy.Kodamati@....com>
Subject: Re: [PATCH v7 10/17] cxl/pci: Add log message and add type check in
 existing RAS handlers

Terry Bowman wrote:
> The CXL RAS handlers do not currently log if the RAS registers are
> unmapped. This is needed in order to help debug CXL error handling. Update
> the CXL driver to log a warning message if the RAS register block is
> unmapped.
> 
> Also, add type check before processing EP or RCH DP.
> 
> Signed-off-by: Terry Bowman <terry.bowman@....com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> Reviewed-by: Ira Weiny <ira.weiny@...el.com>
> Reviewed-by: Gregory Price <gourry@...rry.net>
> ---
>  drivers/cxl/core/pci.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 69bb030aa8e1..af809e7cbe3b 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -658,15 +658,19 @@ static void __cxl_handle_cor_ras(struct device *dev,
>  	void __iomem *addr;
>  	u32 status;
>  
> -	if (!ras_base)
> +	if (!ras_base) {
> +		dev_warn_once(dev, "CXL RAS register block is not mapped");
>  		return;
> +	}
>  
>  	addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
>  	status = readl(addr);
> -	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> -		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> +	if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK))
> +		return;
> +	writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> +
> +	if (is_cxl_memdev(dev))
>  		trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);

I think trace_cxl_aer_correctable_error() should always fire and this
should somehow be unified with the CPER record trace-event for protocol
errors.

The only usage of @memdev in this trace is retrieving the device serial
number. If the device is not a memdev then print zero for the serial
number, or something like that.

In the end RAS daemon should only need to enable one trace event to get
protocol errors and header logs from ports or endpoints, either
natively, or via CPER.


> -	}
>  }
>  
>  static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
> @@ -702,8 +706,10 @@ static bool __cxl_handle_ras(struct device *dev, void __iomem *ras_base)
>  	u32 status;
>  	u32 fe;
>  
> -	if (!ras_base)
> +	if (!ras_base) {
> +		dev_warn_once(dev, "CXL RAS register block is not mapped");

Is this a "never can happen" print? It seems like an oversight in an
upper layer to get this far down error reporting without the registers
mapped.

Like maybe this is a bug in a driver that should crash, or the driver
should not be registering broken error handlers?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ