[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <306E5909-C77F-415E-A2F4-6FAE3E5740DB@oracle.com>
Date: Thu, 18 Sep 2025 16:59:56 +0000
From: Qing Zhao <qing.zhao@...cle.com>
To: Kees Cook <kees@...nel.org>
CC: Andrew Pinski <pinskia@...il.com>, Jakub Jelinek <jakub@...hat.com>,
Martin Uecker <uecker@...raz.at>, Richard Biener <rguenther@...e.de>,
Joseph
Myers <josmyers@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Jan
Hubicka <hubicka@....cz>,
Richard Earnshaw <richard.earnshaw@....com>,
Richard Sandiford <richard.sandiford@....com>,
Marcus Shawcroft
<marcus.shawcroft@....com>,
Kyrylo Tkachov <kyrylo.tkachov@....com>,
Kito
Cheng <kito.cheng@...il.com>,
Palmer Dabbelt <palmer@...belt.com>,
Andrew
Waterman <andrew@...ive.com>,
Jim Wilson <jim.wilson.gcc@...il.com>,
Dan Li
<ashimida.1990@...il.com>,
Sami Tolvanen <samitolvanen@...gle.com>,
Ramon de
C Valle <rcvalle@...gle.com>,
Joao Moreira <joao@...rdrivepizza.com>,
Nathan
Chancellor <nathan@...nel.org>,
Bill Wendling <morbo@...gle.com>,
"gcc-patches@....gnu.org" <gcc-patches@....gnu.org>,
"linux-hardening@...r.kernel.org" <linux-hardening@...r.kernel.org>
Subject: Re: [PATCH v3 2/7] kcfi: Add core Kernel Control Flow Integrity
infrastructure
> On Sep 17, 2025, at 17:09, Kees Cook <kees@...nel.org> wrote:
>
> On Wed, Sep 17, 2025 at 01:42:32PM +0000, Qing Zhao wrote:
>> This version of the middle-end change is much simpler and cleaner-:).
>
> Thanks! I think it's getter closer (hopefully). :)
>
>>> On Sep 13, 2025, at 19:23, Kees Cook <kees@...nel.org> wrote:
>>> +- KCFI check-call instrumentation must survive tail call optimization.
>>> + If an indirect call is turned into an indirect jump, KCFI checking
>>> + must still happen (but it will use a jmp rather than a call).
>>
>> I didn’t see any code changes in this patch address the above issue,
>> is the issue automatically resolved without special handling?
>
> The logic for this is handled by the split RTL patterns on the backend. We
> end up with 4 RTL patterns for KCFI that match the regular 4 call
> patterns:
>
> - call
> - call with return value
> - sibcall
> - sibcall with return value
>
> In the RTL assembly output the "is this a sibcall?" test is made to
> choose between emitting a "call" or a "jump" insn.
Oh, okay, I see.
>
>>> +- Functions that may be called indirectly have a preamble added,
>>> + __cfi_$original_func_name, which contains the $typeid value:
>>> +
>>> + __cfi_target_func:
>>> + .word $typeid
>>> + target_func:
>>> + [regular function entry...]
>>> +
>>> +- The preamble needs to interact with patchable function entry so that
>>> + the typeid appears further away from the actual start of the function
>>> + (leaving the prefix NOPs of the patchable function entry unchanged).
>>> + This means only _globally defined_ patchable function entry is supported
>>> + with KCFI (indrect call sites must know in advance what the offset is,
>>> + which may not be possible with extern functions that use a function
>>> + attribute to change their patchable function entry characteristics).
>>> + For example, a "4,4" patchable function entry would end up like:
>>> +
>>> + __cfi_target_func:
>>> + .data $typeid
>>> + nop nop nop nop
>>> + target_func:
>>> + [regular function entry...]
>>> +
>>> + Architectures may need to add alignment nops prior to the typeid to keep
>>> + __cfi_target_func aligned for function call conventions.
>>
>> I am still a little confused with the above, are there two “nops” need to be computed
>> and added: one is for patchable function entry, the other one is for architecture specific
>> alignment nops?
>> If so, you might need to clarify the above to make this clear.
>
> Yes, this is a confusing bit of logic that needs more clarity. I'll
> improve this. Here's what happens:
>
> Normal function has no preamble:
>
> func:
> body...
>
> With KCFI, a preamble is created to hold the typeid to be checked from
> site sites (addressed as -4 from "func"):
>
> __cfi_func:
> .word typeid_value
> func:
> body...
>
> A "patchable function entry" function has both "prefix" and "entry" nops
> added:
>
> __pfe_func:
> nop // "prefix" nops
> nop
> func:
> nop // "entry" nops
> nop
> nop
> body...
>
> Confusingly, the argument specifies total (and optionally prefix):
> -fpatchable-function-entry=TOTAL[,PREFIX]
> So the above example is -fpatchable-function-entry=5,2 (5 total NOPs,
> with 2 of them being preamble insns).
>
> For KCFI, callsites need to address the typeid, so a normal KCFI
> callsite would use:
>
> load %tmp, -4(%target)
>
> but when PFE is active, the typeid must be placed before the prefix NOPs
> since PFE requires that the entire space is NOPs. Therefore the prefix
> NOPs need to be included (and measured in _bytes_, not instructions)
> when loading the typeid:
>
> load %tmp, -12(%target)
> // 2 nops (8 bytes on aarch64) and 4 bytes for typeid == -12
>
> Which corresponds to the resulting function preamble layout:
>
> __cfi_func:
> .word typeid_value
> __pfe_func:
> nop // "prefix" nops
> nop
> func:
> nop // "entry" nops
> nop
> nop
> body...
Okay, the above is clean.
The global:
/* For callsite typeid loading offset. */
+HOST_WIDE_INT kcfi_patchable_entry_prefix_nops = 0;
is for the above “prefix” nops. And this “prefix_nops” will impact the call site loading offset.
>
> Now, an _additional_ requirement for x86 is that __cfi_func be function
> entry aligned, so that Linux can, if it chooses, live-patch the entire
> KCFI and PFE prefix area into a callable target (this is the "FineIBT"
> KCFI alternative). So, when -falign-functions=N is set, given x86's 1
> byte NOPs and the "movl" encoding used for holding the KCFI type id, the
> final layout, given -falign-functions=8 -fpatchable-function-entry=4,1
> would be:
>
> __cfi_func:
> nop // "alignment" nops // 2 bytes total
> nop
> .word typeid_value // 5 bytes total
> __pfe_func:
> nop // "prefix" nops // 1 byte total
> func:
> nop // "entry" nops
> nop
> nop
> body...
>
> 4 total PFE bytes with 1 as prefix (leving 3 at the func entry). And
> to align __cfi_func to 8 bytes, we have 5 byte typeid insn, and 1 byte
> "prefix" nop, so we need 2 more bytes to be the "alignment" nops.
Okay, I see.
+/* For preamble alignment. */
+static HOST_WIDE_INT kcfi_patchable_entry_arch_alignment_nops = 0;
Is for this “alignment” nops. And this “alignment_nops” will NOT impact the call site loading offset.
It only impacts the position of the “__cfi_func” symbol.
The above examples explain the whole picture very well.
It might be a good idea to include them as comments of the routine “kcfi_prepare_alignment_nops”.
>
>
> This layout was not obvious initially for x86 because Linux's FineIBT
> implementation uses -falign-functions=16 -fpatchable-function-entry=11,11
> so the alignment nops are pre-calculated.
>
>>
>>> +
>>> +- External functions that are address-taken have a weak __kcfi_typeid_$func
>>> + symbol added with the typeid value available so that the typeid can be
>>> + referenced from assembly linkages, etc, where the typeid values cannot be
>>> + calculated (i.e where C type information is missing):
>>> +
>>> + .weak __kcfi_typeid_$func
>>> + .set __kcfi_typeid_$func, $typeid
>>> +
>>
>> From my previous understanding, the above weak symbol is emitted for external functions
>> that are address-taken AND does not have a definition in the compilation. So the weak symbols
>> Is emitted at the declaration site of the external function, is this true?
>>
>> If so, could you please clarify this in the above?
>
> Yes, this happens via assemble_external_real, which can be called under
> a few conditions in gcc/varasm.cc.
Okay. Please clarify this in the design doc.
>
>>> +- Keep indirect calls from being merged (see earlier example) by
>>> + checking the KCFI insn's typeid for equality.
>>
>> Is this resolved by the following code:
>>
>> rtlanal.cc
>> index 63a1d08c46cf..5016fe93ccac 100644
>> --- a/gcc/rtlanal.cc
>> +++ b/gcc/rtlanal.cc
>> @@ -1177,6 +1177,11 @@ reg_referenced_p (const_rtx x, const_rtx body)
>> case IF_THEN_ELSE:
>> return reg_overlap_mentioned_p (x, body);
>>
>> + case KCFI:
>> + /* For KCFI wrapper, check both the wrapped call and the type ID. */
>> + return (reg_overlap_mentioned_p (x, XEXP (body, 0))
>> + || reg_overlap_mentioned_p (x, XEXP (body, 1)));
>> +
>
> The above is needed for accurate register "liveness" checking. When the
> above code is removed, the kcfi-move-preservation.c regression test
> fails (since it doesn't see the clobbers).
Okay. I see.
>
> AFAICT, simply making it a new type of RTL (the DEF_RTL_EXPR), made it
> unmergeable.
Then is it possible some legal merging might not work anymore with this change?
> I assume this is because whatever was doing the call
> merging was looking strictly for "CALL" types, but I honestly don't know
> where that was happening.
>
>>> +/* Common helper for RTL patterns to emit .kcfi_traps section entry. */
>>
>> I noticed that you didn’t explain each parameter of the function in all the comments for the functions.
>> This need to be updated for all the new functions.
>
> For externs like these, should the parameter documentation go in the .h
> file, or the .cc file?
My understanding is the parameter doc going in the .cc file (just double checked some gcc files to make sure this) -:)
>
>>> +void
>>> +kcfi_emit_traps_section (FILE *file, rtx trap_label_sym)
>>> +{
>>> + /* Generate entry label internally and get its number. */
>>> + rtx entry_label = gen_label_rtx ();
>>> + int entry_labelno = CODE_LABEL_NUMBER (entry_label);
>>
>> Is the only usage of the new RTX “entry_label” is to generate a label_number?
>> If so, the entry_label is not needed at all. You can get a distinct labelno for each
>> Lkcfi_entry, for example, the function id for the current function.
>
> It is, yes. I can't use the function id because it's only incremented per
> function and a given function may have multiple kcfi call sites within
> it.
Okay. I see.
So, you need a unique lableno for each Lkcfi_entryN? Any other requirement?
> I did have a version of this logic that used a kcfi-specific global
> counter but (at the time) I was having trouble with it
What kind of issue?
> and had seen that
> other "custom label" examples in the code base used this style, so I
> switched to that.
My concern is, the new generated RTX "entry_label” is not used at all, will there be any member leak from this?
>
> I have since figured out why the global counter wasn't work (I was using
> it during expansion and not during insn output, so I had cases where a
> call was getting duplicated and I had a repeated label). If it's
> preferred, I could try switching back to the global counter to avoid
> these "useless" gen_label_rtx calls?
Yes, global counter approach is better.
>
>>> +static uint32_t
>>> +kcfi_get_type_id (tree fn_type)
>>> +{
>>> + uint32_t type_id;
>>> +
>>> + /* Cache the attribute identifier. */
>>> + if (!kcfi_type_id_attr)
>>> + kcfi_type_id_attr = get_identifier ("kcfi_type_id");
>>> +
>>> + tree attr = lookup_attribute (IDENTIFIER_POINTER (kcfi_type_id_attr),
>>> + TYPE_ATTRIBUTES (fn_type));
>>
>> The above can be simplified as:
>> + tree attr = lookup_attribute (“kcfi_type_id”, TYPE_ATTRIBUTES (fn_type));
>
> Ugh, I totally misunderstood the examples I saw of this. I thought they
> were caching the string lookup, but now that I look more closely, I see:
>
> #define IDENTIFIER_POINTER(NODE) \
> ((const char *) IDENTIFIER_NODE_CHECK (NODE)->identifier.id.str)
>
> it's just returning the string!
>
> I will throw away the "caching" I was doing. I thought it would actually
> look up the attribute using the tree returned by get_identifier, but I
> see there is no overloaded lookup_attribute that takes a tree argument.
>
> *face palm*
-:)
>
>>> +/* Emit KCFI type ID symbol for an address-taken external function. */
>>
>> Is it more accurate to say:
>>
>> Emit KCFI type ID symbol for the declaration of an address-taken external function FNDECL
>> to the assembly file ASM_FILE.
>>
>> ??
>
> Yup, I will update it.
>
>>> + /* Process all functions - both local and external. */
>>> + FOR_EACH_FUNCTION (node)
>>> + {
>>> + tree fndecl = node->decl;
>>> +
>>> + /* Skip all non-NORMAL builtins (MD, FRONTEND) entirely.
>>> + For NORMAL builtins, skip those that lack an implicit
>>> + implementation (closest way to distinguishing DEF_LIB_BUILTIN
>>> + from others). E.g. we need to have typeids for memset(). */
>>
>> I see indentation issue in the above comments.
>
> This looks like your email client again. It passes
> contrib/check_GNU_style.py:
>
> FOR_EACH_FUNCTION (node)$
> {$
> tree fndecl = node->decl;$
> $
> /* Skip all non-NORMAL builtins (MD, FRONTEND) entirely.$
> ^I For NORMAL builtins, skip those that lack an implicit$
> ^I implementation (closest way to distinguishing DEF_LIB_BUILTIN$
> ^I from others). E.g. we need to have typeids for memset(). */$
>
> Or is there something special I need to be doing differently for
> comments?
Yeah, I guess it’s issue with my mail client. Sorry about that.
>
>>
>>> + if (fndecl_built_in_p (fndecl))
>>> + {
>>> + if (DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
>>> + continue;
>>> + if (!builtin_decl_implicit_p (DECL_FUNCTION_CODE (fndecl)))
>>> + continue;
>>> + }
>>
>> Also see indentation issue in the above.
>
> if (fndecl_built_in_p (fndecl))$
> ^I{$
> ^I if (DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)$
> ^I continue;$
> ^I if (!builtin_decl_implicit_p (DECL_FUNCTION_CODE (fndecl)))$
> ^I continue;$
> ^I}$
>
> Looks like the same thing?
Yeah.
>
>
> Thanks for the review! I'll have v4 ready soon.
Thanks.
Qing
>
> -Kees
>
> --
> Kees Cook
Powered by blists - more mailing lists