linux-kernel - Re: [PATCH 08/12] x86/dumpstack: Pin the target stack in save_stack_trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160915063747.GC13992@gmail.com>
Date:   Thu, 15 Sep 2016 08:37:47 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Andy Lutomirski <luto@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     x86@...nel.org, Borislav Petkov <bp@...en8.de>,
        linux-kernel@...r.kernel.org, Brian Gerst <brgerst@...il.com>,
        Jann Horn <jann@...jh.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 08/12] x86/dumpstack: Pin the target stack in
 save_stack_trace_tsk()

* Andy Lutomirski <luto@...nel.org> wrote:

> This will prevent a crash if the target task dies before or while
> dumping its stack once we start freeing task stacks early.
> 
> Signed-off-by: Andy Lutomirski <luto@...nel.org>
> ---
>  arch/x86/kernel/stacktrace.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> index 4738f5e0f2ab..b3f32fbe3ba4 100644
> --- a/arch/x86/kernel/stacktrace.c
> +++ b/arch/x86/kernel/stacktrace.c
> @@ -79,9 +79,14 @@ void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
>  
>  void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
>  {
> +	if (!try_get_task_stack(tsk))
> +		return;
> +
>  	dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace);
>  	if (trace->nr_entries < trace->max_entries)
>  		trace->entries[trace->nr_entries++] = ULONG_MAX;
> +
> +	put_task_stack(tsk);
>  }
>  EXPORT_SYMBOL_GPL(save_stack_trace_tsk);

So I very much like the first half of the series, the thread_info merge into 
task_struct (yay!) and I am in the process of applying those bits to -tip.

So the above not (yet) fully correct patch is in preparation to a later change you 
are doing in this series:

    [PATCH 11/12] sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK

But I am having serious second thoughts about the fact that we now have to sprinke 
non-obvious try_get_task_stack()/put_task_stack() pairs into debugging code that 
never really needed anything like that, which pairs are easy to get wrong, easy to 
miss - and generally if we miss them will it will result in a slightly buggy, very 
unpleasant, racy, crashy behavior of the kernel.

Can we just ... not do the delayed freeing?

I mean, the cache hotness arguments don't look overly convincing to me: if we just 
start not using a piece of formerly cache hot memory then those cache lines will 
eventually decay naturally in whatever LRU scheme the CPU is using and will be 
reused without much harm done. It's just "another day in CPU land" that happens 
all the time. Yes, we could reduce the latency of the freeing and thus slightly 
improve the locality of other bits ... but is it really measurable or noticeable 
in any fashion. It's like task/thread fork/clone/exit is _that_ much of a fast 
path.

And then there's also the whole CONFIG_THREAD_INFO_IN_TASK #ifdeffery spreading.

I.e I just don't see the justification for this kind of object life time assymetry 
and hard to debug fragility.

Am I missing something?

Thanks,

	Ingo