[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sg4pa4wt.fsf@nanos.tec.linutronix.de>
Date: Sat, 20 Mar 2021 22:31:14 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: "Chang S. Bae" <chang.seok.bae@...el.com>, bp@...e.de,
luto@...nel.org, mingo@...nel.org, x86@...nel.org
Cc: len.brown@...el.com, dave.hansen@...el.com, jing2.liu@...el.com,
ravi.v.shankar@...el.com, linux-kernel@...r.kernel.org,
chang.seok.bae@...el.com
Subject: Re: [PATCH v4 18/22] x86/fpu/amx: Define AMX state components and have it used for boot-time checks
On Sun, Feb 21 2021 at 10:56, Chang S. Bae wrote:
>
> +static void check_xtile_data_against_struct(int size)
> +{
> + u32 max_palid, palid, state_size;
> + u32 eax, ebx, ecx, edx;
> + u16 max_tile;
> +
> + /*
> + * Check the maximum palette id:
> + * eax: the highest numbered palette subleaf.
> + */
> + cpuid_count(TILE_CPUID, 0, &max_palid, &ebx, &ecx, &edx);
> +
> + /*
> + * Cross-check each tile size and find the maximum
> + * number of supported tiles.
> + */
> + for (palid = 1, max_tile = 0; palid <= max_palid; palid++) {
> + u16 tile_size, max;
> +
> + /*
> + * Check the tile size info:
> + * eax[31:16]: bytes per title
> + * ebx[31:16]: the max names (or max number of tiles)
> + */
> + cpuid_count(TILE_CPUID, palid, &eax, &ebx, &edx, &edx);
> + tile_size = eax >> 16;
> + max = ebx >> 16;
> +
> + if (WARN_ONCE(tile_size != sizeof(struct xtile_data),
> + "%s: struct is %zu bytes, cpu xtile %d bytes\n",
> + __stringify(XFEATURE_XTILE_DATA),
> + sizeof(struct xtile_data), tile_size))
> + __xstate_dump_leaves();
> +
> + if (max > max_tile)
> + max_tile = max;
> + }
> +
> + state_size = sizeof(struct xtile_data) * max_tile;
> + if (WARN_ONCE(size != state_size,
> + "%s: calculated size is %u bytes, cpu state %d bytes\n",
> + __stringify(XFEATURE_XTILE_DATA), state_size, size))
> + __xstate_dump_leaves();
So we have 2 warnings which complain about inconsistent state and that's
it? Why has this absolutely no consequences? We just keep stuff enabled
and jug along, right?
Which one of the two states is correct? Why don't we just disable that
muck and be done with it to play it safe?
Failing to execute some workload by saying NO due to inconsistency is
far more useful than taking the chance of potential silent data
corruption.
Thanks,
tglx
Powered by blists - more mailing lists