[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c2fd06e-2e97-4724-80ab-8695aa4334e7@intel.com>
Date: Thu, 11 Jul 2024 13:58:30 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: "Yang, Weijiang" <weijiang.yang@...el.com>, tglx@...utronix.de,
x86@...nel.org, seanjc@...gle.com, pbonzini@...hat.com,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: peterz@...radead.org, chao.gao@...el.com, rick.p.edgecombe@...el.com,
mlevitsk@...hat.com, john.allen@....com
Subject: Re: [PATCH 0/6] Introduce CET supervisor state support
On 7/8/24 20:17, Yang, Weijiang wrote:
> So I'm not sure whether XFEATURE_MASK_KERNEL_DYNAMIC and related changes
> are worth or not for this series.
>
> Could you share your thoughts?
First of all, I really do appreciate when folks make the effort to _try_
to draw their own conclusions before asking the maintainers to share
theirs. Next time, OK? ;)
But here goes. So we've basically got three cases. Here's a fancy table:
> https://docs.google.com/spreadsheets/d/e/2PACX-1vROHIgrtHzUJmdlzT7D7tuVzgM8AMlK2XlorvFIJvk-I0NjD7A-T_qntjz7cUJlCScfWGtSfPK30Xtu/pubhtml
... and the same in ASCII
Case |IA32_XSS[12] | Space | RFBM[12] | Drop%
-----+-------------+-------+----------+------
1 | 0 | None | 0 | 0.0%
2 | 1 | None | 0 | 0.2%
3 | 1 | 24B? | 1 | 0.2%
Case 1 is the baseline of course. Case 2 avoids allocating space for
CET and also leans on the kernel to set RFBM[12]==0 and tell the
hardware not to write CET-S state. Case 3 wastes the CET-S space in
each task and also leans on the hardware init optimization to avoid
writing out CET-S space on each XSAVES.
#1 is: 0 lines of code.
#2 is: 5 files changed, 90 insertions(+), 27 deletions(-)
#3 is: very few lines of code, nearing zero
#2 and #3 have the same performance.
So we're down to choosing between
* $BYTES space in 'struct fpu' (on hardware supporting CET-S)
or
* ~100 loc
$BYTES is 24, right? Did I get anything wrong?
So, here's my stake in the ground: I think the 100 lines of code is
probably worth it. But I also hate complicating the FPU code, so I'm
also somewhat drawn to just eating the 24 bytes and moving on.
But I'm still in the "case 2" camp.
Anybody disagree?
Powered by blists - more mailing lists