lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240731091133.07ddd58c@foz.lan>
Date: Wed, 31 Jul 2024 09:11:33 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Igor Mammedov <imammedo@...hat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>, Shiju Jose
 <shiju.jose@...wei.com>, "Michael S. Tsirkin" <mst@...hat.com>, Ani Sinha
 <anisinha@...hat.com>, Dongjiu Geng <gengdongjiu1@...il.com>, Eric Blake
 <eblake@...hat.com>, Markus Armbruster <armbru@...hat.com>, Michael Roth
 <michael.roth@....com>, Paolo Bonzini <pbonzini@...hat.com>, Peter Maydell
 <peter.maydell@...aro.org>, linux-kernel@...r.kernel.org,
 qemu-arm@...gnu.org, qemu-devel@...gnu.org
Subject: Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses
 and FW first ARM processor error injection

Em Tue, 30 Jul 2024 13:17:09 +0200
Igor Mammedov <imammedo@...hat.com> escreveu:

> On Mon, 22 Jul 2024 08:45:56 +0200
> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
> 
> that's quite a bit of code that in 99% won't ever be used
> (assuming error injection testing scenario),
> not to mention it's a hw depended one and governed by different specs.
>
> Essentially we would need to create _whole_ lot of QAPI
> commands to cover possible errors for no benefit to QEMU.
> 
> Let take for example very simple _OST status reporting,
> QEMU of cause can decode values and present it to users in
> more 'presentable' form. However instead of translating
> numbers (aka. spec language) into a made up QEMU language,
> QEMU just passes values up the stack and users can use
> well defined spec to interpret its meaning.
> 
> benefits are: QEMU doesn't have to maintain translation
> code and QAPI ABI is limited to passing raw values.
> 
> Can we do similar thing here as well?
> i.e. simplify error injection commands to
> a command that takes raw value and passes it
> to guest (QEMU here acts as proxy, if I'm not
> mistaken)?
> 
> Preferably make it generic enough to handle
> not only ARM but other error formats HEST is
> able to handle.

A too generic interface doesn't sound feasible to me, as the
EINJ code needs to check QEMU implementation details before
doing the error inject.

See, processor is probably the simplest error injection
source, as most of the fields there aren't related to how
the hardware simulation is done.

Yet, if you see patch 7 of this series, you'll notice that some
fields should actually be filled based on the emulation.

On ARM, we have some IDs that depend on the emulation
(MIDR, MPIDR, power state). Doing that on userspace may require
a QAPI to query them.

The memory layout, however, is the most complex one. Even for
an ARM processor CPER (which is the simplest scenario), the 
physical/virtual address need to be checked against the emulation
environment.

Other error sources (like memory errors, CXL, etc) will require
a deep knowledge about how QEMU mapped such devices.

So, in practice, if we move this to an EINJ script, we'll need
to add a probably more complex QAPI to allow querying the memory
layout and other device and CPU specific bindings.

Also, we don't know what newer versions of ACPI spec will reserve
us. See, even the HEST table contents is dependent of the HEST 
revision number, as made clear at the ACPI 6.5 notes:

	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source

and at:

	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#error-source-structure-header-type-12-onward

So, if we're willing to add support for a more generic "raw data"
QAPI, I would still do it per-type, and for the fields that won't
require knowledge of the device-emulation details.

Btw, my proposal on patch 7 of this series is to have raw data
for:
	- the error-info field;
	- registers dump;
	- micro-architecture specific data.

I don't mind trying to have more raw data there as I see (marginal) 
benefits of allowing to generate CPER invalid records [1], but some of
those  fields need to be validated and/or filled internally at QEMU - if
not forced to an specific value by the caller.

[1] a raw data EINJ can be useful for fuzzy logic fault detection to 
    check if badly formed packages won't cause a Kernel panic or be
    an exploit. Yet, not really a concern for APEI, as if the hardware
    is faulty, a Kernel panic is not out of the table. Also, if the
    the BIOS is already compromised and has malicious code on it, 
    the EINJ interface is not the main concern.

> PS:
> For user convenience, QEMU can carry a script that
> could help generate this raw value in user friendly way
> but at the same time it won't put maintenance
> burden on QEMU itself.

The script will still require reviews, and the same code will 
be there. So, from maintenance burden, there won't be much
difference.

Btw, I'm actually using myself a script to test it, currently
sitting together with rasdaemon - which is the Linux tool to detect
and handle hardware errors:

	https://github.com/mchehab/rasdaemon/blob/master/contrib/qemu_einj.py

as it helps a lot when trying to simulate more complex errors.

Once QEMU gains support to inject processor errors, I can prepare a 
separate patch to move it to QEMU.

Thanks,
Mauro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ