[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240916130506.293ab543@imammedo.users.ipa.redhat.com>
Date: Mon, 16 Sep 2024 13:05:06 +0200
From: Igor Mammedov <imammedo@...hat.com>
To: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
Cc: "Michael S. Tsirkin" <mst@...hat.com>, Ani Sinha <anisinha@...hat.com>,
Dongjiu Geng <gengdongjiu1@...il.com>, linux-kernel@...r.kernel.org,
qemu-arm@...gnu.org, qemu-devel@...gnu.org
Subject: Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST
address
On Sat, 14 Sep 2024 07:33:14 +0200
Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
> Hi Igor,
>
> Em Fri, 13 Sep 2024 15:25:18 +0200
> Igor Mammedov <imammedo@...hat.com> escreveu:
>
> > > > in addition to this, it needs a patch on top to make sure
> > > > that we migrate hest_addr_le.
> > > > See a08a64627b6b 'ACPI: Record the Generic Error Status Block address'
> > > > and fixes on top of that for an example.
> > >
> > > Hmm... If I understood such change well, vmstate_ghes_state() will
> > > use this structure as basis to do migration:
> > >
> > > /* ghes.h */
> > > typedef struct AcpiGhesState {
> > > uint64_t hest_addr_le;
> > > uint64_t ghes_addr_le;
> > > bool present; /* True if GHES is present at all on this board */
> > > } AcpiGhesState;
> > >
> > > /* generic_event_device.c */
> > > static const VMStateDescription vmstate_ghes_state = {
> > > .name = "acpi-ged/ghes",
> > > .version_id = 1,
> > > .minimum_version_id = 1,
> > > .needed = ghes_needed,
> > > .fields = (VMStateField[]) {
> > > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> > > vmstate_ghes_state, AcpiGhesState),
> > > VMSTATE_END_OF_LIST()
> > > }
> > > };
> >
> > current code looks like that:
> >
> > static const VMStateDescription vmstate_ghes = {
> > .name = "acpi-ghes",
> > .version_id = 1,
> > .minimum_version_id = 1,
> > .fields = (const VMStateField[]) {
> > VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), <<===
> > VMSTATE_END_OF_LIST()
> > },
> > };
> >
> > static bool ghes_needed(void *opaque)
> > {
> > AcpiGedState *s = opaque;
> > return s->ghes_state.ghes_addr_le;
> > }
> >
> > static const VMStateDescription vmstate_ghes_state = {
> > .name = "acpi-ged/ghes",
> > .version_id = 1,
> > .minimum_version_id = 1,
> > .needed = ghes_needed,
> > .fields = (const VMStateField[]) {
> > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> > vmstate_ghes, AcpiGhesState),
> > VMSTATE_END_OF_LIST()
> > }
> > };
> >
> > where
> > VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> > explicitly defines field(s) within structure to be sent over wire.
> >
> > we need to add a conditional field for ghes_addr_le
> > which will be sent only with new machine types, but not with old ones
> > to avoid migration breakage.
> >
> > I don't know much about migration, but maybe we can get away with
> > similar condition as in ghes_needed(), or enabling QMP error injection
> > based on machine type version.
> >
> > Or maybe adding a CLI option to enable QMP error injection in which
> > case the explicit option would serve as a trigger enable QMP command and
> > to migrate hest_addr_le.
> > It might be even better this way, since machine wouldn't need to
> > carry extra error source that will be used only for testing
> > and practically never in production VMs (aka reduced attack surface).
> >
> > You can easily test it locally:
> > new-qemu: with your patches
> > old-qemu: qemu-9.1
> >
> > and then try to do forth & back migration for following cases:
> > 1. (ping-pong case with OLD firmware/ACPI tables)
> > start old-qemu with 9.1 machine type ->
> > migrate to file ->
> > start new-qemu with 9.1 machine type -> restore from file ->
> > migrate to file ->
>
> As I never used migration, I'm a little stuck with the command line
> parameters.
>
> I guess I got the one to do the migration at the monitor:
>
> (qemu) migrate file://tmp/migrate
>
> But no idea how to start a machine using a saved state.
see https://www.linux-kvm.org/page/Migration
'savevm/loadvm to an external state file (using pseudo-migration)' section
>
> > start old-qemu with 9.1 machine type ->restore from file ->
> >
> > 2. (ping-pong case with NEW firmware/ACPI tables)
> > do the same as #1 but starting with new-qemu binary
> >
> > (from upstream pov #2 is optional, but not implementing it
> > is pain for downstream so it's better to have it if it's not
> > too much work)
>
> If I understood the migration documentation, every when new fields
> are added, we should increment .version_id. If new version is
> not backward-compatible, .minimum_version_id is also incremented.
>
> So, for a migration-compatible code with a 9.1 VM, the code needs to
> handle the case where hest_addr_le is not defined, e. g. use offsets
> relative to ghes_addr_le, just like the current version, e.g.:
>
> uint64_t cper_addr, read_ack_start_addr;
>
> AcpiGedState *acpi_ged_state =
> ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
> AcpiGhesState *ags = &acpi_ged_state->ghes_state;
>
> if (!ags->hest_addr_le) {
> // Backward-compatible migration code
> uint64_t base = le64_to_cpu(ags->ghes_addr_le);
>
> *read_ack_start_addr = base +
> ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> error_source_to_index[notify] * sizeof(uint64_t);
>
> *cper_addr = base +
> ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> } else {
> // Use the new logic from ags->hest_addr_le
> }
>
> There are two problems with that:
>
> 1. On your reviews, if I understood right, the code above is not
> migration safe. So, while implementing it would be technically
> correct, migration still won't work;
>
> 2. With the new code, ACPI_GHES_ERROR_SOURCE_COUNT is not
> defined anymore, as the size of the error source structure can
> be different on different architectures, being 2 for new
> VMs and 1 for old ones.
>
> Basically the new code gets it right because it can see a
> pointer to the HEST table, so it can get the number from there:
>
> hest_addr = le64_to_cpu(ags->hest_addr_le);
> cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
>
> But, without hest_addr_le, getting num_sources is not possible.
>
> An alternative would be to add a hacky code that works only for
> arm machines (as new versions may support more archs).
>
> Something like:
> #define V1_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 1
> #define V2_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 2
>
> And have a hardcoded logic that would work before/after this
> changeset but may break on newer versions, if the number of
> source IDs change, if we add other HEST types, etc.
>
> Now, assuming that such hack would work, it sounds too hacky to
> my taste.
>
> So, IMO it is a lot safer to not support migrations from v1 (only
> ghes_addr_le), using a patch like the enclosed one to ensure that.
>
> Btw, checking existing migration structs, it sounds that for almost all
> structures, .version_id is identical to .minimum_version_id, meaning that
> migration between different versions aren't supported on most cases.
let me try to find more examples on how to implement migration bits
hopefully more comprehensible.
>
> Thanks,
> Mauro
>
> ---
>
> [PATCH] acpi/generic_event_device: Update GHES migration to cover hest addr
>
> The GHES migration logic at GED should now support HEST table
> location too.
>
> Increase migration version and change needed to check for both
> ghes_addr_le and hest_addr_le.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
>
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index b4c83a089a02..efae0ff62c7b 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -351,10 +351,11 @@ static const VMStateDescription vmstate_ged_state = {
>
> static const VMStateDescription vmstate_ghes = {
> .name = "acpi-ghes",
> - .version_id = 1,
> - .minimum_version_id = 1,
> + .version_id = 2,
> + .minimum_version_id = 2,
> .fields = (const VMStateField[]) {
> VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> + VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> VMSTATE_END_OF_LIST()
> },
> };
> @@ -362,13 +363,13 @@ static const VMStateDescription vmstate_ghes = {
> static bool ghes_needed(void *opaque)
> {
> AcpiGedState *s = opaque;
> - return s->ghes_state.ghes_addr_le;
> + return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le;
> }
>
> static const VMStateDescription vmstate_ghes_state = {
> .name = "acpi-ged/ghes",
> - .version_id = 1,
> - .minimum_version_id = 1,
> + .version_id = 2,
> + .minimum_version_id = 2,
> .needed = ghes_needed,
> .fields = (const VMStateField[]) {
> VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
>
>
Powered by blists - more mailing lists