[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250429165410.00002c86@huawei.com>
Date: Tue, 29 Apr 2025 16:54:10 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Karolina Stolarek <karolina.stolarek@...cle.com>
CC: Bjorn Helgaas <helgaas@...nel.org>, "Shen, Yijun" <Yijun.Shen@...l.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, <linux-pci@...r.kernel.org>, Jon Pan-Doh
<pandoh@...gle.com>, Terry Bowman <terry.bowman@....com>, Len Brown
<lenb@...nel.org>, James Morse <james.morse@....com>, Tony Luck
<tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>, Ben Cheatham
<Benjamin.Cheatham@....com>, Ira Weiny <ira.weiny@...el.com>, Shuai Xue
<xueshuai@...ux.alibaba.com>, Liu Xinpeng <liuxp11@...natelecom.cn>, "Darren
Hart" <darren@...amperecomputing.com>, Dan Williams
<dan.j.williams@...el.com>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER
reporting paths
On Fri, 25 Apr 2025 16:12:26 +0200
Karolina Stolarek <karolina.stolarek@...cle.com> wrote:
> On 25/04/2025 15:14, Jonathan Cameron wrote:
> > On Fri, 25 Apr 2025 12:32:10 +0200
> > Karolina Stolarek <karolina.stolarek@...cle.com> wrote:
> >>
> >> It's possible that some of the nuances of this escaped me. I decided to
> >> pick up the series, as I saw "PCI Express bus error injection via GHES"
> >> script and thought it might be useful.
> >
> > With Mauro's series you can inject (on ARM64 virt) any CPER record you
> > like. That doesn't synchronize the wider state of the system though
> > so may not exercise everything (PCI registers etc not updated as it
> > is only injecting the record). Mostly it just works, as remarkably
> > few error handlers actually take the state of the components on which
> > the error is reported into account.
>
> OK, that means even if we manage to inject a PCIe error, AER wouldn't be
> able to look up the Source ID and other values it needs to report an
> error, which is not quite the solution I was looking for.
Isn't the source ID in the CPER record? (Device ID field) or do
you mean something else?
>
> > The aim is specifically to allow exercising FW first error handling
> > paths because it's a pain to get real systems that have firmware to inject
> > the full range of what the kernel etc need to handle.
>
> Does this include PCIe errors? If so, that probably doesn't make sense
> to try to test my patch on an actual system?
Ideally test it on a real system as well, but indeed the intent is to
allow testing of PCI errors on emulation.
>
> > x86 support for emulated injection is a work in progress (more of a mess wrt
> > to the different ways the event signaling is handled than it is on arm64).
> >
> > I did have an earlier version of that work wired up to the same
> > hooks as the native CXL error injection but I dropped it from my QEMU
> > CXL staging tree for now as it was a pain to rebase whilst Mauro was rapidly
> > revising the infrastructure. I'll bring it back when I get time.
>
> I understand, I saw some of your series while looking for ways to test
> my patch. Thank you very much for your work. As you can see, there are
> people actually looking forward to it :)
Great! I'll try and get back to wiring it all up again sometime soon.
Jonathan
>
>
> All the best,
> Karolina
>
> >
> > Jonathan
> >
> >>
> >>> Unfortunately there are some typos in the spec (FIRMWARE_FIRST,
> >>> FIRMWAREFIRST in 18.4), so it's a little hard to find all the
> >>> references.
> >>
> >> Thanks for the pointers, I'll take a look.
> >>
> >>> It's a long shot, but I added Yijun as a Dell contact that who might
> >>> have a pointer to someone who could possibly test GHES logging on a
> >>> Dell box with and without your patch so we could have a concrete
> >>> comparison of the dmesg log differences.
> >>
> >> Thank you very much. Let's see, maybe we'll get lucky :)
> >>
> >> All the best,
> >> Karolina
> >>
> >>>
> >>>>> If you can't produce actual logs for comparison, I think we can take
> >>>>> info from a sample log somebody has posted and synthesize what the
> >>>>> changes would be after this patch.
> >>>>
> >>>> I also found some logs at some point, mostly from 2021 and 2023, but I felt
> >>>> bad about mocking up the messages and tried to produce actual logs. If I
> >>>> can't find a way to get this working in two weeks, I'll revisit this idea.
> >>>>
> >>>> All the best,
> >>>> Karolina
> >>>>
> >>>> -------------------------------------------------------------
> >>>> [1] - https://lore.kernel.org/lkml/76824dfc6bb5dd23a9f04607a907ac4ccf7cb147.1740653898.git.mchehab+huawei@kernel.org/
> >>
> >>
> >
>
>
Powered by blists - more mailing lists