lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0f4944a4-fd05-4365-9416-378a7385547b@oracle.com>
Date: Fri, 25 Apr 2025 16:12:26 +0200
From: Karolina Stolarek <karolina.stolarek@...cle.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, "Shen, Yijun" <Yijun.Shen@...l.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
        Jon Pan-Doh <pandoh@...gle.com>, Terry Bowman <terry.bowman@....com>,
        Len Brown <lenb@...nel.org>, James Morse <james.morse@....com>,
        Tony Luck <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>,
        Ben Cheatham <Benjamin.Cheatham@....com>,
        Ira Weiny <ira.weiny@...el.com>,
        Shuai Xue <xueshuai@...ux.alibaba.com>,
        Liu Xinpeng
 <liuxp11@...natelecom.cn>,
        Darren Hart <darren@...amperecomputing.com>,
        Dan Williams <dan.j.williams@...el.com>, linux-cxl@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER
 reporting paths

On 25/04/2025 15:14, Jonathan Cameron wrote:
> On Fri, 25 Apr 2025 12:32:10 +0200
> Karolina Stolarek <karolina.stolarek@...cle.com> wrote:
>> 
>> It's possible that some of the nuances of this escaped me. I decided to
>> pick up the series, as I saw "PCI Express bus error injection via GHES"
>> script and thought it might be useful.
> 
> With Mauro's series you can inject (on ARM64 virt) any CPER record you
> like.  That doesn't synchronize the wider state of the system though
> so may not exercise everything (PCI registers etc not updated as it
> is only injecting the record).  Mostly it just works, as remarkably
> few error handlers actually take the state of the components on which
> the error is reported into account.

OK, that means even if we manage to inject a PCIe error, AER wouldn't be 
able to look up the Source ID and other values it needs to report an 
error, which is not quite the solution I was looking for.

> The aim is specifically to allow exercising FW first error handling
> paths because it's a pain to get real systems that have firmware to inject
> the full range of what the kernel etc need to handle.

Does this include PCIe errors? If so, that probably doesn't make sense 
to try to test my patch on an actual system?

> x86 support for emulated injection is a work in progress (more of a mess wrt
> to the different ways the event signaling is handled than it is on arm64).
> 
> I did have an earlier version of that work wired up to the same
> hooks as the native CXL error injection but I dropped it from my QEMU
> CXL staging tree for now as it was a pain to rebase whilst Mauro was rapidly
> revising the infrastructure.  I'll bring it back when I get time.

I understand, I saw some of your series while looking for ways to test 
my patch. Thank you very much for your work. As you can see, there are 
people actually looking forward to it :)


All the best,
Karolina

> 
> Jonathan
> 
>>
>>> Unfortunately there are some typos in the spec (FIRMWARE_FIRST,
>>> FIRMWAREFIRST in 18.4), so it's a little hard to find all the
>>> references.
>>
>> Thanks for the pointers, I'll take a look.
>>
>>> It's a long shot, but I added Yijun as a Dell contact that who might
>>> have a pointer to someone who could possibly test GHES logging on a
>>> Dell box with and without your patch so we could have a concrete
>>> comparison of the dmesg log differences.
>>
>> Thank you very much. Let's see, maybe we'll get lucky :)
>>
>> All the best,
>> Karolina
>>
>>>    
>>>>> If you can't produce actual logs for comparison, I think we can take
>>>>> info from a sample log somebody has posted and synthesize what the
>>>>> changes would be after this patch.
>>>>
>>>> I also found some logs at some point, mostly from 2021 and 2023, but I felt
>>>> bad about mocking up the messages and tried to produce actual logs. If I
>>>> can't find a way to get this working in two weeks, I'll revisit this idea.
>>>>
>>>> All the best,
>>>> Karolina
>>>>
>>>> -------------------------------------------------------------
>>>> [1] - https://lore.kernel.org/lkml/76824dfc6bb5dd23a9f04607a907ac4ccf7cb147.1740653898.git.mchehab+huawei@kernel.org/
>>
>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ