[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250221134036.641af213@pumpkin>
Date: Fri, 21 Feb 2025 13:40:36 +0000
From: David Laight <david.laight.linux@...il.com>
To: Andrew Cooper <andrew.cooper3@...rix.com>
Cc: Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
linux-kernel@...r.kernel.org, alyssa.milburn@...el.com,
scott.d.constable@...el.com, joao@...rdrivepizza.com, jpoimboe@...nel.org,
jose.marchesi@...cle.com, hjl.tools@...il.com, ndesaulniers@...gle.com,
samitolvanen@...gle.com, nathan@...nel.org, ojeda@...nel.org,
kees@...nel.org, alexei.starovoitov@...il.com, mhiramat@...nel.org,
jmill@....edu
Subject: Re: [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode
On Wed, 19 Feb 2025 17:31:39 +0000
Andrew Cooper <andrew.cooper3@...rix.com> wrote:
> On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> > --- a/arch/x86/include/asm/cfi.h
> > +++ b/arch/x86/include/asm/cfi.h
> > @@ -1116,6 +1129,52 @@ extern u8 fineibt_caller_end[];
> >
> > #define fineibt_caller_jmp (fineibt_caller_size - 2)
> >
> > +/*
> > + * Since FineIBT does hash validation on the callee side it is prone to
> > + * circumvention attacks where a 'naked' ENDBR instruction exists that
> > + * is not part of the fineibt_preamble sequence.
> > + *
> > + * Notably the x86 entry points must be ENDBR and equally cannot be
> > + * fineibt_preamble.
> > + *
> > + * The fineibt_paranoid caller sequence adds additional caller side
> > + * hash validation. This stops such circumvetion attacks dead, but at the cost
> > + * of adding a load.
> > + *
> > + * <fineibt_paranoid_start>:
> > + * 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d
> > + * 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d
> > + * a: 4d 8d 5b <f0> lea -0x10(%r11), %r11
I think that 0x10 is the size of the cfi premable?
There should probably be at least a comment to that effect.
(Maybe there is, but I'm missing the actual patch email.)
> > + * e: 75 fd jne d <fineibt_paranoid_start+0xd>
> > + * 10: 41 ff d3 call *%r11
> > + * 13: 90 nop
> > + *
> > + * Notably LEA does not modify flags and can be reordered with the CMP,
> > + * avoiding a dependency.
Is that even worth saying?
Given that the cpu does 'register renaming' the lea might execute in the
same clock as the mov.
What you do get is a few clocks of stall (maybe 4 if in L1 cache, but
a data read of code memory is unlikely to be there - so it'll be from
the L2 cache) for the memory load.
That means that the jne is speculatively executed (and I think that is
separate from any prefetch speculation), I'll give it 50% taken.
(Or maybe 100% if backwards branches get predicted taken. I don't think
current Intel cpu do that - they just use whatever in in the branch
prediction slot.)
> > + * Again, using a non-taken (backwards) branch
> > + * for the failure case, abusing LEA's immediate 0xf0 as LOCK prefix for the
> > + * JCC.d8, causing #UD.
> > + */
>
> I don't know what to say. This is equal parts horrifying and beautiful.
Agreed.
Are you absolutely sure that all cpu have (and will) always #UD the unexpected
LOCK prefix on a Jcc instruction.
My 80386 book does say it will #UD, but I can imagine it being ignored
or even repurposed.
David
Powered by blists - more mailing lists