lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250429165410.00002c86@huawei.com>
Date: Tue, 29 Apr 2025 16:54:10 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Karolina Stolarek <karolina.stolarek@...cle.com>
CC: Bjorn Helgaas <helgaas@...nel.org>, "Shen, Yijun" <Yijun.Shen@...l.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>, <linux-pci@...r.kernel.org>, Jon Pan-Doh
	<pandoh@...gle.com>, Terry Bowman <terry.bowman@....com>, Len Brown
	<lenb@...nel.org>, James Morse <james.morse@....com>, Tony Luck
	<tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>, Ben Cheatham
	<Benjamin.Cheatham@....com>, Ira Weiny <ira.weiny@...el.com>, Shuai Xue
	<xueshuai@...ux.alibaba.com>, Liu Xinpeng <liuxp11@...natelecom.cn>, "Darren
 Hart" <darren@...amperecomputing.com>, Dan Williams
	<dan.j.williams@...el.com>, <linux-cxl@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER
 reporting paths

On Fri, 25 Apr 2025 16:12:26 +0200
Karolina Stolarek <karolina.stolarek@...cle.com> wrote:

> On 25/04/2025 15:14, Jonathan Cameron wrote:
> > On Fri, 25 Apr 2025 12:32:10 +0200
> > Karolina Stolarek <karolina.stolarek@...cle.com> wrote:  
> >> 
> >> It's possible that some of the nuances of this escaped me. I decided to
> >> pick up the series, as I saw "PCI Express bus error injection via GHES"
> >> script and thought it might be useful.  
> > 
> > With Mauro's series you can inject (on ARM64 virt) any CPER record you
> > like.  That doesn't synchronize the wider state of the system though
> > so may not exercise everything (PCI registers etc not updated as it
> > is only injecting the record).  Mostly it just works, as remarkably
> > few error handlers actually take the state of the components on which
> > the error is reported into account.  
> 
> OK, that means even if we manage to inject a PCIe error, AER wouldn't be 
> able to look up the Source ID and other values it needs to report an 
> error, which is not quite the solution I was looking for.

Isn't the source ID in the CPER record? (Device ID field) or do
you mean something else?

> 
> > The aim is specifically to allow exercising FW first error handling
> > paths because it's a pain to get real systems that have firmware to inject
> > the full range of what the kernel etc need to handle.  
> 
> Does this include PCIe errors? If so, that probably doesn't make sense 
> to try to test my patch on an actual system?

Ideally test it on a real system as well, but indeed the intent is to
allow testing of PCI errors on emulation.

> 
> > x86 support for emulated injection is a work in progress (more of a mess wrt
> > to the different ways the event signaling is handled than it is on arm64).
> > 
> > I did have an earlier version of that work wired up to the same
> > hooks as the native CXL error injection but I dropped it from my QEMU
> > CXL staging tree for now as it was a pain to rebase whilst Mauro was rapidly
> > revising the infrastructure.  I'll bring it back when I get time.  
> 
> I understand, I saw some of your series while looking for ways to test 
> my patch. Thank you very much for your work. As you can see, there are 
> people actually looking forward to it :)

Great!  I'll try and get back to wiring it all up again sometime soon.

Jonathan

> 
> 
> All the best,
> Karolina
> 
> > 
> > Jonathan
> >   
> >>  
> >>> Unfortunately there are some typos in the spec (FIRMWARE_FIRST,
> >>> FIRMWAREFIRST in 18.4), so it's a little hard to find all the
> >>> references.  
> >>
> >> Thanks for the pointers, I'll take a look.
> >>  
> >>> It's a long shot, but I added Yijun as a Dell contact that who might
> >>> have a pointer to someone who could possibly test GHES logging on a
> >>> Dell box with and without your patch so we could have a concrete
> >>> comparison of the dmesg log differences.  
> >>
> >> Thank you very much. Let's see, maybe we'll get lucky :)
> >>
> >> All the best,
> >> Karolina
> >>  
> >>>      
> >>>>> If you can't produce actual logs for comparison, I think we can take
> >>>>> info from a sample log somebody has posted and synthesize what the
> >>>>> changes would be after this patch.  
> >>>>
> >>>> I also found some logs at some point, mostly from 2021 and 2023, but I felt
> >>>> bad about mocking up the messages and tried to produce actual logs. If I
> >>>> can't find a way to get this working in two weeks, I'll revisit this idea.
> >>>>
> >>>> All the best,
> >>>> Karolina
> >>>>
> >>>> -------------------------------------------------------------
> >>>> [1] - https://lore.kernel.org/lkml/76824dfc6bb5dd23a9f04607a907ac4ccf7cb147.1740653898.git.mchehab+huawei@kernel.org/  
> >>
> >>  
> >   
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ