lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250506180300.00006527@huawei.com>
Date: Tue, 6 May 2025 18:03:00 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Karolina Stolarek <karolina.stolarek@...cle.com>
CC: Bjorn Helgaas <helgaas@...nel.org>, "Shen, Yijun" <Yijun.Shen@...l.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>, <linux-pci@...r.kernel.org>, Jon Pan-Doh
	<pandoh@...gle.com>, Terry Bowman <terry.bowman@....com>, Len Brown
	<lenb@...nel.org>, James Morse <james.morse@....com>, Tony Luck
	<tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>, Ben Cheatham
	<Benjamin.Cheatham@....com>, Ira Weiny <ira.weiny@...el.com>, Shuai Xue
	<xueshuai@...ux.alibaba.com>, Liu Xinpeng <liuxp11@...natelecom.cn>, "Darren
 Hart" <darren@...amperecomputing.com>, Dan Williams
	<dan.j.williams@...el.com>, <linux-cxl@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER
 reporting paths

On Mon, 5 May 2025 11:58:25 +0200
Karolina Stolarek <karolina.stolarek@...cle.com> wrote:

> On 29/04/2025 17:54, Jonathan Cameron wrote:
> > On Fri, 25 Apr 2025 16:12:26 +0200
> > Karolina Stolarek <karolina.stolarek@...cle.com> wrote:  
> >>
> >> OK, that means even if we manage to inject a PCIe error, AER wouldn't be
> >> able to look up the Source ID and other values it needs to report an
> >> error, which is not quite the solution I was looking for.  
> > 
> > Isn't the source ID in the CPER record? (Device ID field) or do
> > you mean something else?  
> 
> Ah, sorry, I got confused on the way. I meant that even if we have the 
> Device ID in CPER set, the specific device has no data in aer_regs if we 
> inject an error using the GHES error injection script. We probably would 
> end up with !info->status in aer_print_error(), thus printing only a 
> line about "Inaccessible" agent and return early.

If you were feeling creative with scripts you might be able to make this
work today...  Qemu does allow native aer injection via pcie_aer_inject_error
which will fill in the stuff in the device and 'try' to trigger an interrupt.
That last bit will fail (I think) if we are doing fw first handling.
(you might need to just prevent the interrupt generation in a similar fashion
to this code did here:

https://gitlab.com/jic23/qemu/-/commit/ce801e4d5b5cc5417cc7c7e5ecdaaa2ca5d6efe3#8eeec1fb38fa7149cc37b7a56dc193d69281ee96_704_708

At that point if you were to inject GHES error using Mauro's stuff it will work
and find that pre injected hardware info.

If not we need a refresh of that patch to hook up record generation with
Mauro's new handling. That's what I plan to get to but will be a while yet.

J



> 
> >>> The aim is specifically to allow exercising FW first error handling
> >>> paths because it's a pain to get real systems that have firmware to inject
> >>> the full range of what the kernel etc need to handle.  
> >>
> >> Does this include PCIe errors? If so, that probably doesn't make sense
> >> to try to test my patch on an actual system?  
> > 
> > Ideally test it on a real system as well, but indeed the intent is to
> > allow testing of PCI errors on emulation.  
> 
> I understand. Do you have pointers on how to inject it on a real system? 
> All info I could find about FW error injection pointed to the qemu 
> scripts I mentioned.

Sorry no.  It maybe system specific and disabled on production bios.

> 
> >>> x86 support for emulated injection is a work in progress (more of a mess wrt
> >>> to the different ways the event signaling is handled than it is on arm64).
> >>>
> >>> I did have an earlier version of that work wired up to the same
> >>> hooks as the native CXL error injection but I dropped it from my QEMU
> >>> CXL staging tree for now as it was a pain to rebase whilst Mauro was rapidly
> >>> revising the infrastructure.  I'll bring it back when I get time.  
> >>
> >> I understand, I saw some of your series while looking for ways to test
> >> my patch. Thank you very much for your work. As you can see, there are
> >> people actually looking forward to it :)  
> > 
> > Great!  I'll try and get back to wiring it all up again sometime soon.  
> 
> Awesome, thanks.
> 
> Bjorn, is this patch blocking the ratelimiting series? Would it be 
> acceptable to use public logs in the commit message? I'm asking because 
> it looks like there's no easy way to trigger the GHES path, or it would 
> take some time, further delaying the ratelimiting work.
> 
> All the best,
> Karolina
> 
> > 
> > Jonathan
> >   
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ