[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <73e3a82b-fbf6-5448-56ba-adda130230d3@linux.com>
Date: Fri, 27 May 2022 02:25:12 +0300
From: Alexander Popov <alex.popov@...ux.com>
To: Mark Rutland <mark.rutland@....com>
Cc: linux-arm-kernel@...ts.infradead.org, akpm@...ux-foundation.org,
catalin.marinas@....com, keescook@...omium.org,
linux-kernel@...r.kernel.org, luto@...nel.org, will@...nel.org
Subject: Re: [PATCH v2 07/13] stackleak: rework poison scanning
On 24.05.2022 16:31, Mark Rutland wrote:
> On Sun, May 15, 2022 at 08:33:01PM +0300, Alexander Popov wrote:
>> On 10.05.2022 16:13, Mark Rutland wrote:
>>> On Mon, May 09, 2022 at 04:51:35PM +0300, Alexander Popov wrote:
>>>> Hello Mark!
>>>>
>>>> On 27.04.2022 20:31, Mark Rutland wrote:
>>>>> Currently we over-estimate the region of stack which must be erased.
>>>>>
>>>>> To determine the region to be erased, we scan downards for a contiguous
>>>>> block of poison values (or the low bound of the stack). There are a few
>>>>> minor problems with this today:
>>>>>
>>>>> * When we find a block of poison values, we include this block within
>>>>> the region to erase.
>>>>>
>>>>> As this is included within the region to erase, this causes us to
>>>>> redundantly overwrite 'STACKLEAK_SEARCH_DEPTH' (128) bytes with
>>>>> poison.
>>>>
>>>> Right, this can be improved.
>>>>
>>>>> * As the loop condition checks 'poison_count <= depth', it will run an
>>>>> additional iteration after finding the contiguous block of poison,
>>>>> decrementing 'erase_low' once more than necessary.
>>>>
>>>> Actually, I think the current code is correct.
>>>>
>>>> I'm intentionally searching one poison value more than
>>>> STACKLEAK_SEARCH_DEPTH to avoid the corner case. See the BUILD_BUG_ON
>>>> assertion in stackleak_track_stack() that describes it:
>>>>
>>>> /*
>>>> * Having CONFIG_STACKLEAK_TRACK_MIN_SIZE larger than
>>>> * STACKLEAK_SEARCH_DEPTH makes the poison search in
>>>> * stackleak_erase() unreliable. Let's prevent that.
>>>> */
>>>> BUILD_BUG_ON(CONFIG_STACKLEAK_TRACK_MIN_SIZE > STACKLEAK_SEARCH_DEPTH);
>>>
>>> I had read that, but as written that doesn't imply that it's necessary to scan
>>> one more element than STACKLEAK_SEARCH_DEPTH, nor why. I'm more than happy to
>>> change the logic, but I think we need a very clear explanation as to why we
>>> need to scan the specific number of bytes we scan, and we should account for
>>> that *within* STACKLEAK_SEARCH_DEPTH for clarity.
>>
>> I'll try to explain.
>>
>> The stackleak gcc plugin instruments the kernel code inserting the
>> 'stackleak_track_stack()' calls for the functions with a stack frame size
>> greater than or equal to CONFIG_STACKLEAK_TRACK_MIN_SIZE.
>>
>> The kernel functions with a smaller stack frame are not instrumented (the
>> 'lowest_stack' value is not updated for them).
>
> I understood these points.
>
> It's also worth noting that `noinstr` code will also not be instrumented
> regardless of frame size -- we might want some build-time assertion for those.
I developed a trick that shows noinstr functions that stackleak would like to instrument:
diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
index 42f0252ee2a4..6db748d44957 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -397,6 +397,9 @@ static unsigned int stackleak_cleanup_execute(void)
const char *fn = DECL_NAME_POINTER(current_function_decl);
bool removed = false;
+ if (verbose)
+ fprintf(stderr, "stackleak: I see noinstr function %s()\n", fn);
+
/*
* Leave stack tracking in functions that call alloca().
* Additional case:
@@ -464,12 +467,12 @@ static bool stackleak_gate(void)
if (STRING_EQUAL(section, ".meminit.text"))
return false;
if (STRING_EQUAL(section, ".noinstr.text"))
- return false;
+ return true;
if (STRING_EQUAL(section, ".entry.text"))
return false;
}
- return track_frame_size >= 0;
+ return false;
}
/* Build the function declaration for stackleak_track_stack() */
@@ -589,8 +592,6 @@ __visible int plugin_init(struct plugin_name_args *plugin_info,
build_for_x86 = true;
} else if (!strcmp(argv[i].key, "disable")) {
disable = true;
- } else if (!strcmp(argv[i].key, "verbose")) {
- verbose = true;
} else {
error(G_("unknown option '-fplugin-arg-%s-%s'"),
plugin_name, argv[i].key);
@@ -598,6 +599,8 @@ __visible int plugin_init(struct plugin_name_args *plugin_info,
}
}
+ verbose = true;
+
if (disable) {
if (verbose)
fprintf(stderr, "stackleak: disabled for this translation unit\n");
Building defconfig for x86_64 gives this:
stackleak: I see noinstr function __do_fast_syscall_32()
stackleak: instrument __do_fast_syscall_32(): calls_alloca
--
stackleak: I see noinstr function do_syscall_64()
stackleak: instrument do_syscall_64(): calls_alloca
--
stackleak: I see noinstr function do_int80_syscall_32()
stackleak: instrument do_int80_syscall_32(): calls_alloca
--
stackleak: I see noinstr function do_machine_check()
stackleak: instrument do_machine_check()
--
stackleak: I see noinstr function exc_general_protection()
stackleak: instrument exc_general_protection()
--
stackleak: I see noinstr function fixup_bad_iret()
stackleak: instrument fixup_bad_iret()
The cases with calls_alloca are caused by CONFIG_RANDOMIZE_KSTACK_OFFSET=y.
Kees knows about that peculiarity.
Other cases are noinstr functions with large stack frame:
do_machine_check(), exc_general_protection(), and fixup_bad_iret().
I think adding a build-time assertion is not possible, since it would break
building the kernel.
>> Any kernel function may leave uninitialized data on its stack frame. The
>> poison scanning must handle that correctly. The uninitialized groups of
>> poison values must be smaller than the search depth, otherwise
>> 'stackleak_erase()' is unreliable.
>
> I had understood this, but I had understood that for a caller->callee pair,
> *something* would be pushed onto the stack. On arm64 that'd be a frame record
> in the caller's stack frame, and on x86 that would be the return address
> between the stack frames of the caller and callee. Since any unrecorded frame
> is less than CONFIG_STACKLEAK_TRACK_MIN_SIZE, the non-poison bytes would fall
> within CONFIG_STACKLEAK_TRACK_MIN_SIZE bytes.
Yes, exactly.
> There is a potential edge case on arm64, since the frame record is permitted to
> be placed anywhere within the stack frame, and in theory it could be placed
> high on the caller and low on the callee. If we wanted to handle that, we'd
> need to scan 2 times the tracking size. In practice compilers consistently
> place the frame record at one end (usually the low end, as part of constructing
> the frame).
Good to hear that.
>> So with this BUILD_BUG_ON I control that
>> CONFIG_STACKLEAK_TRACK_MIN_SIZE <= STACKLEAK_SEARCH_DEPTH.
>>
>> To be sure and avoid mistakes in the edge cases, 'stackleak_erase()' is
>> searching one poison value more than STACKLEAK_SEARCH_DEPTH.
>
> Just to check my understanding, did you have a specific edge-case in mind, or
> was that "just in case"?
I didn't have a specific edge-case, but I wanted to avoid possible weird problems.
> It would be really nice if we had an example.
>
>> If you don't like this one additional poison value in the search, I would
>> propose to change the assertion.
>
> If we clearly document *why*, then changing the assertion is fine by me.
> However, as above I don't think that this is necessary.
>
> As an aside, why is it possible to configure CONFIG_STACKLEAK_TRACK_MIN_SIZE to
> be bigger than STACKLEAK_SEARCH_DEPTH in the first place?
>
> In security/Kconfig.hardening we have:
>
> | config STACKLEAK_TRACK_MIN_SIZE
> | int "Minimum stack frame size of functions tracked by STACKLEAK"
> | default 100
> | range 0 4096
> | depends on GCC_PLUGIN_STACKLEAK
> | help
> | The STACKLEAK gcc plugin instruments the kernel code for tracking
> | the lowest border of the kernel stack (and for some other purposes).
> | It inserts the stackleak_track_stack() call for the functions with
> | a stack frame size greater than or equal to this parameter.
> | If unsure, leave the default value 100.
>
> ... where the vast majority of that range is going to lead to a BUILD_BUG().
Honestly, I don't like the idea of having the STACKLEAK_TRACK_MIN_SIZE option in the Kconfig.
I was forced by the maintainers to introduce it when I was working on the stackleak patchset.
How about dropping CONFIG_STACKLEAK_TRACK_MIN_SIZE from Kconfig?
That would also allow to drop this build-time assertion.
>> What do you think?
>>
>>>>> As this is included within the region to erase, this causes us to
>>>>> redundantly overwrite an additional unsigned long with poison.
>>>>>
>>>>> * As we always decrement 'erase_low' after checking an element on the
>>>>> stack, we always include the element below this within the region to
>>>>> erase.
>>>>>
>>>>> As this is included within the region to erase, this causes us to
>>>>> redundantly overwrite an additional unsigned long with poison.
>>>>>
>>>>> Note that this is not a functional problem. As the loop condition
>>>>> checks 'erase_low > task_stack_low', we'll never clobber the
>>>>> STACK_END_MAGIC. As we always decrement 'erase_low' after this, we'll
>>>>> never fail to erase the element immediately above the STACK_END_MAGIC.
>>>>
>>>> Right, I don't see any bug in the current erasing code.
>>>>
>>>> When I wrote the current code, I carefully checked all the corner cases. For
>>>> example, on the first stack erasing, the STACK_END_MAGIC was not
>>>> overwritten, but the memory next to it was erased. Same for the beginning of
>>>> the stack: I carefully checked that no unpoisoned bytes were left on the
>>>> thread stack.
>>>>
>>>>> In total, this can cause us to erase `128 + 2 * sizeof(unsigned long)`
>>>>> bytes more than necessary, which is unfortunate.
>>>>>
>>>>> This patch reworks the logic to find the address immediately above the
>>>>> poisoned region, by finding the lowest non-poisoned address. This is
>>>>> factored into a stackleak_find_top_of_poison() helper both for clarity
>>>>> and so that this can be shared with the LKDTM test in subsequent
>>>>> patches.
>>>>
>>>> You know, I wrote stackleak_erase() in very plain C. I wanted a compiler to
>>>> generate assembly that is very close to the original asm version. I worried
>>>> that compilers might do weird stuff with the local variables and the stack
>>>> pointer.
>>>>
>>>> So I checked stackleak for gcc versions 4.8, 5, 6, 7, 8, 9, and 10 on
>>>> x86_64, i386 and arm64. This is my project that helped with this work:
>>>> https://github.com/a13xp0p0v/kernel-build-containers
>>>
>>> I've used the kernel.org cross toolchains, as published at:
>>>
>>> https://mirrors.edge.kernel.org/pub/tools/crosstool/
>>>
>>>> Mark, in this patch series you use many local variables and helper functions.
>>>> Honestly, this worries me. For example, compilers can (and usually do)
>>>> ignore the presence of the 'inline' specifier for the purpose of
>>>> optimization.
>>>
>>> I've deliberately used `__always_inline` rather than regular `inline` to
>>> prevent code being placed out-of-line. As mentioned in oether replies it has a
>>> stronger semantic.
>>>
>>> Thanks,
>>> Mark.
>>>
>>>>
>>>> Thanks!
>>>>
>>>>> Signed-off-by: Mark Rutland <mark.rutland@....com>
>>>>> Cc: Alexander Popov <alex.popov@...ux.com>
>>>>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>>>>> Cc: Andy Lutomirski <luto@...nel.org>
>>>>> Cc: Kees Cook <keescook@...omium.org>
>>>>> ---
>>>>> include/linux/stackleak.h | 26 ++++++++++++++++++++++++++
>>>>> kernel/stackleak.c | 18 ++++--------------
>>>>> 2 files changed, 30 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h
>>>>> index 467661aeb4136..c36e7a3b45e7e 100644
>>>>> --- a/include/linux/stackleak.h
>>>>> +++ b/include/linux/stackleak.h
>>>>> @@ -42,6 +42,32 @@ stackleak_task_high_bound(const struct task_struct *tsk)
>>>>> return (unsigned long)task_pt_regs(tsk);
>>>>> }
>>>>> +/*
>>>>> + * Find the address immediately above the poisoned region of the stack, where
>>>>> + * that region falls between 'low' (inclusive) and 'high' (exclusive).
>>>>> + */
>>>>> +static __always_inline unsigned long
>>>>> +stackleak_find_top_of_poison(const unsigned long low, const unsigned long high)
>>>>> +{
>>>>> + const unsigned int depth = STACKLEAK_SEARCH_DEPTH / sizeof(unsigned long);
>>>>> + unsigned int poison_count = 0;
>>>>> + unsigned long poison_high = high;
>>>>> + unsigned long sp = high;
>>>>> +
>>>>> + while (sp > low && poison_count < depth) {
>>>>> + sp -= sizeof(unsigned long);
>>>>> +
>>>>> + if (*(unsigned long *)sp == STACKLEAK_POISON) {
>>>>> + poison_count++;
>>>>> + } else {
>>>>> + poison_count = 0;
>>>>> + poison_high = sp;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + return poison_high;
>>>>> +}
>>>>> +
>>>>> static inline void stackleak_task_init(struct task_struct *t)
>>>>> {
>>>>> t->lowest_stack = stackleak_task_low_bound(t);
>>>>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
>>>>> index ba346d46218f5..afd54b8e10b83 100644
>>>>> --- a/kernel/stackleak.c
>>>>> +++ b/kernel/stackleak.c
>>>>> @@ -74,20 +74,10 @@ static __always_inline void __stackleak_erase(void)
>>>>> {
>>>>> const unsigned long task_stack_low = stackleak_task_low_bound(current);
>>>>> const unsigned long task_stack_high = stackleak_task_high_bound(current);
>>>>> - unsigned long erase_low = current->lowest_stack;
>>>>> - unsigned long erase_high;
>>>>> - unsigned int poison_count = 0;
>>>>> - const unsigned int depth = STACKLEAK_SEARCH_DEPTH / sizeof(unsigned long);
>>>>> -
>>>>> - /* Search for the poison value in the kernel stack */
>>>>> - while (erase_low > task_stack_low && poison_count <= depth) {
>>>>> - if (*(unsigned long *)erase_low == STACKLEAK_POISON)
>>>>> - poison_count++;
>>>>> - else
>>>>> - poison_count = 0;
>>>>> -
>>>>> - erase_low -= sizeof(unsigned long);
>>>>> - }
>>>>> + unsigned long erase_low, erase_high;
>>>>> +
>>>>> + erase_low = stackleak_find_top_of_poison(task_stack_low,
>>>>> + current->lowest_stack);
>>>>> #ifdef CONFIG_STACKLEAK_METRICS
>>>>> current->prev_lowest_stack = erase_low;
>>>>
>>
Powered by blists - more mailing lists