linux-kernel - Re: [PATCH v3 00/14] Change ghes to use HEST-based offsets and add support for error inject

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250221073823.061a1039@foz.lan>
Date: Fri, 21 Feb 2025 07:38:23 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Igor Mammedov <imammedo@...hat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>, "Michael S . Tsirkin"
 <mst@...hat.com>, Shiju Jose <shiju.jose@...wei.com>,
 <qemu-arm@...gnu.org>, <qemu-devel@...gnu.org>, Philippe Mathieu-Daudé <philmd@...aro.org>, Ani Sinha
 <anisinha@...hat.com>, Cleber Rosa <crosa@...hat.com>, Dongjiu Geng
 <gengdongjiu1@...il.com>, Eduardo Habkost <eduardo@...kost.net>, Eric Blake
 <eblake@...hat.com>, John Snow <jsnow@...hat.com>, Marcel Apfelbaum
 <marcel.apfelbaum@...il.com>, "Markus Armbruster" <armbru@...hat.com>,
 Michael Roth <michael.roth@....com>, "Paolo Bonzini" <pbonzini@...hat.com>,
 Peter Maydell <peter.maydell@...aro.org>, Shannon Zhao
 <shannon.zhaosl@...il.com>, Yanan Wang <wangyanan55@...wei.com>, Zhao Liu
 <zhao1.liu@...el.com>, <kvm@...r.kernel.org>,
 <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 00/14] Change ghes to use HEST-based offsets and add
 support for error inject

Em Mon, 3 Feb 2025 16:22:36 +0100
Igor Mammedov <imammedo@...hat.com> escreveu:

> On Mon, 3 Feb 2025 11:09:34 +0000
> Jonathan Cameron <Jonathan.Cameron@...wei.com> wrote:
> 
> > On Fri, 31 Jan 2025 18:42:41 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
> >   
> > > Now that the ghes preparation patches were merged, let's add support
> > > for error injection.
> > > 
> > > On this series, the first 6 patches chang to the math used to calculate offsets at HEST
> > > table and hardware_error firmware file, together with its migration code. Migration tested
> > > with both latest QEMU released kernel and upstream, on both directions.
> > > 
> > > The next patches add a new QAPI to allow injecting GHESv2 errors, and a script using such QAPI
> > >    to inject ARM Processor Error records.
> > > 
> > > If I'm counting well, this is the 19th submission of my error inject patches.    
> > 
> > Looks good to me. All remaining trivial things are in the category
> > of things to consider only if you are doing another spin.  The code
> > ends up how I'd like it at the end of the series anyway, just
> > a question of the precise path to that state!  
> 
> if you look at series as a whole it's more or less fine (I guess you
> and me got used to it)
> 
> however if you take it patch by patch (as if you've never seen it)
> ordering is messed up (the same would apply to everyone after a while
> when it's forgotten)
> 
> So I'd strongly suggest to restructure the series (especially 2-6/14).
> re sum up my comments wrt ordering:
> 
> 0  add testcase for HEST table with current HEST as expected blob
>    (currently missing), so that we can be sure that we haven't messed
>    existing tables during refactoring.

Not sure if I got this one. The HEST table is part of etc/acpi/tables,
which is already tested, as you pointed at the previous reviews. Doing
changes there is already detected. That's basically why we added patches
10 and 12:

	[PATCH v3 10/14] tests/acpi: virt: allow acpi table changes for a new table: HEST
	[PATCH v3 12/14] tests/acpi: virt: add a HEST table to aarch64 virt and update DSDT

What tests don't have is a check for etc/hardware_errors firmware inside 
tests/data/acpi/aarch64/virt/, but, IMO, we shouldn't add it there.

See, hardware_errors table contains only some skeleton space to
store:

	- 1 or more error block address offsets;
	- 1 or more read ack register;
	- 1 or more HEST source entries containing CPER blocks.

There's nothing there to be actually checked: it is just some
empty spaces with a variable number of fields.

With the new code, the actual number of CPER blocks and their
corresponding offsets and read ack registers can be different on 
different architectures. So, for instance, when we add x86 support,
we'll likely start with just one error source entry, while arm will
have two after this changeset.

Also, one possibility to address the issues reported by Gavin Shan at
https://lore.kernel.org/qemu-devel/20250214041635.608012-1-gshan@redhat.com/
would be to have one entry per each CPU. So, the size of such firmware
could be dependent on the number of CPUs.

So, adding any validation to it would just cause pain and probably
won't detect any problems.

What could be done instead is to have a different type of tests that
would use the error injection script to check if regressions are 
introduced after QEMU 10.0. Such new kind of test would require
this series to be merged first. It would also require the usage of
an OSPM image with some testing tools on it. This is easier said 
than done, as besides the complexity of having an OSPM test image,
such kind of tests would require extra logic, specially if it would
check regressions for SEA and other notification sources.

Thanks,
Mauro