[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211028202905.GO174703@worktop.programming.kicks-ass.net>
Date: Thu, 28 Oct 2021 22:29:05 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Kees Cook <keescook@...omium.org>
Cc: Ard Biesheuvel <ardb@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Sami Tolvanen <samitolvanen@...gle.com>,
X86 ML <x86@...nel.org>, Josh Poimboeuf <jpoimboe@...hat.com>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Sedat Dilek <sedat.dilek@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
linux-hardening@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
llvm@...ts.linux.dev
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI
On Thu, Oct 28, 2021 at 10:12:32AM -0700, Kees Cook wrote:
> On Thu, Oct 28, 2021 at 01:09:39PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 27, 2021 at 03:27:59PM -0700, Kees Cook wrote:
> >
> > > Right -- though wouldn't just adding __ro_after_init do the same?
> > >
> > > DEFINE_STATIC_CALL(static_call_name, func_a) __ro_after_init;
> >
> > That breaks modules (and your jump_label patch doing the same is
> > similarly broken).
>
> Well that's no fun. :) I'd like to understand this better so I can fix
> it!
>
> >
> > When a module is loaded that uses the static_call(), it needs to
> > register it's .static_call_sites range with the static_call_key which
> > requires modifying it.
>
> Reading static_call_add_module() leaves me with even more questions. ;)
Yes, that function is highly magical..
> It looks like module static calls need to write to kernel text?
No, they need to modify the static_call_key though.
> I don't
> understand. Is this when a module is using an non-module key for a call
> site? And in that case, this happens:
>
> key |= s_key & STATIC_CALL_SITE_FLAGS;
>
> Where "key" is not in the module?
>
> And the flags can be:
>
> #define STATIC_CALL_SITE_TAIL 1UL /* tail call */
> #define STATIC_CALL_SITE_INIT 2UL /* init section */
>
> But aren't these per-site attributes? Why are they stored per-key?
They are per site, but stored in the key pointer.
So static_call has (and jump_label is nearly identical):
struct static_call_site {
s32 addr;
s32 key;
};
struct static_call_mod {
struct static_call_mod *next;
struct module *mod;
struct static_call_sutes *sites;
};
struct static_call_key {
void *func;
union {
unsigned long type;
struct static_call_mod *mods;
struct static_call_site *sites;
};
};
__SCT_##name() tramplines (no analog with jump_label)
.static_call_sites section
.static_call_tramp_key section (no analog with jump_label)
Where the key holds the current function pointer and a pointer to either
an array of static_call_site or a pointer to a static_call_mod.
Now, a key observation is that all these data structures are word
aligned, which means we have at least 2 lsb bits to play with. For
static_call_key::{mods,sites} the LSB indicates which, 0:mods, 1:sites.
Then the .static_call_sites section is an array of struct
static_call_site sorted by the static_call_key pointer.
The static_call_sites holds relative displacements, but represents:
struct static_call_key *key;
unsigned long call_address;
Now, since code (on x86) is variable length, there are no spare bits in
the code address, but since static_call_key is aligned, we have spare
bits. It is those bits we use to encode TAIL (Bit0) and INIT (Bit1).
If INIT, the address points to an __init section and we shouldn't try
and touch if after those have been freed or bad stuff happens.
If TAIL, it's a tail-call and we get to write a jump instruction instead
of a call instruction.
So, objtool builds .static_call_sites at built time, then at init (or
module load) time we sort the array by static_call_key pointer, such
that we get consequtive ranges per key. We iterate the array and every
time the key pointer changes, we -- already having the key pointer --
set key->sites to the first.
Now, kernel init of static_call happens *really* early and memory
allocation doesn't work yet, which is why we have that {mods,sites}
thing. Therefore, when the first module gets loaded, we need to allocate
a struct static_call_mod for the kernel (mod==NULL) and transfer the
sites pointer to it and change key to a mods pointer.
So one possible solution would be to have a late init (but before RO),
that, re-iterates the sites array and pre-allocates the kernel
static_call_mod structure. That way, static_call_key gets changed to a
mods pointer and wouldn't ever need changing after that, only the
static_call_mod (which isn't RO) gets changed when modules get
added/deleted.
The above is basically identical to jump_labels. However static_call()
have one more trick:
EXPORT_STATIC_CALL_TRAMP()
That exports the trampoline symbol, but not the static_call_key data
structure. The result is that modules can use the static_call(), but
cannot use static_call_update() because they cannot get at the key.
In this case objtool cannot correctly put the static_call_key address in
the static_call_site, what it does instead is store the trampoline
address (there's a 1:1 relation between key and tramplines). And then we
ues the .static_call_tramp_key section to find a mapping from trampoline
to key and rewrite the site to be 'right'. All this happens before
sorting it on key obv.
Hope that clarifies things, instead of making it worse :-)
Powered by blists - more mailing lists