linux-kernel - Re: [PATCH 3/5] ARM: stacktrace: Allow stack trace saving for non-current tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d1605c5a-1dfb-65b6-f79f-9c92bb507d4b@huawei.com>
Date:   Tue, 26 Jul 2022 20:08:14 +0800
From:   Li Huafei <lihuafei1@...wei.com>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
CC:     Linus Walleij <linus.walleij@...aro.org>, <ardb@...nel.org>,
        <will@...nel.org>, <mark.rutland@....com>, <broonie@...nel.org>,
        <peterz@...radead.org>, <mingo@...hat.com>, <acme@...nel.org>,
        <alexander.shishkin@...ux.intel.com>, <jolsa@...nel.org>,
        <namhyung@...nel.org>, <arnd@...db.de>, <rostedt@...dmis.org>,
        <nick.hawkins@....com>, <john@...ozen.org>, <mhiramat@...nel.org>,
        <ast@...nel.org>, <linyujun809@...wei.com>,
        <ndesaulniers@...gle.com>, <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>, <linux-perf-users@...r.kernel.org>
Subject: Re: [PATCH 3/5] ARM: stacktrace: Allow stack trace saving for
 non-current tasks



On 2022/7/26 17:49, Russell King (Oracle) wrote:
> On Tue, Jul 26, 2022 at 05:12:39PM +0800, Li Huafei wrote:
>>
>>
>> On 2022/7/18 17:07, Linus Walleij wrote:
>>> On Tue, Jul 12, 2022 at 4:18 AM Li Huafei <lihuafei1@...wei.com> wrote:
>>>
>>>> The current ARM implementation of save_stack_trace_tsk() does not allow
>>>> saving stack trace for non-current tasks, which may limit the scenarios
>>>> in which stack_trace_save_tsk() can be used. Like other architectures,
>>>> or like ARM's unwind_backtrace(), we can leave it up to the caller to
>>>> ensure that the task that needs to be unwound is not running.
>>>>
>>>> Signed-off-by: Li Huafei <lihuafei1@...wei.com>
>>>
>>> That sounds good, but:
>>>
>>>>         if (tsk != current) {
>>>> -#ifdef CONFIG_SMP
>>>> -               /*
>>>> -                * What guarantees do we have here that 'tsk' is not
>>>> -                * running on another CPU?  For now, ignore it as we
>>>> -                * can't guarantee we won't explode.
>>>> -                */
>>>> -               return;
>>>> -#else
>>>> +               /* task blocked in __switch_to */
>>>
>>> The commit text is not consistent with the comment you are removing.
>>>
>>> The commit is talking about "non-current" tasks which is one thing,
>>> but the code is avoiding any tasks under SMP because they may be
>>> running on another CPU. So you need to update the commit
>>> message to say something like "non-current or running on another CPU".
>>>
>>> If this condition will be checked at call sites in following patches,
>>> then mention
>>> that in the commit as well, so we know the end result is that we do
>>> not break it,
>>
>> The generic code stack_trace_save_tsk() does not have this check, and by
>> 'caller' I mean the caller of stack_trace_save_tsk(), expecting the
>> 'caller' to ensure that the task is not running. So in effect this check
>> has been dropped and there is no more guarantee. Sorry for not
>> clarifying the change here.
> 
> Can you prove in every case that the thread we're being asked to unwind
> is not running? I don't think you can.
> 
> There are things like proc_pid_stack() in procfs and the stack traces
> in sysrq-t which have attempted to unwind everything whether it's
> running or not.
> 
> So no, there is no guarantee that the thread is blocked in
> __switch_to().
> 

Yes, I agree.

>> But can we assume that the user should know that the stacktrace is
>> unreliable for a task that is running on another CPU? If not, I should
>> remove this patch and keep the check.
> 
> It's not about "unreliable" stack traces, it's about the unwinder
> killing the kernel.
> 
> The hint is this:
> 
>                 frame.fp = thread_saved_fp(tsk);
>                 frame.sp = thread_saved_sp(tsk);
>                 frame.lr = 0;           /* recovered from the stack */
>                 frame.pc = thread_saved_pc(tsk);
> 
> These access the context saved by the scheduler when the task is
> sleeping. When the thread is running, these saved values will be
> the state when the thread last slept. However, with the thread
> running, the stack could now contain any data what so ever, and
> could change at any moment.
> 

I get it. For example, since the data on the stack is changing,
'*(unsigned long *)fp' could access any illegal address and crash the
kernel.

> Whether the unwind-table unwinder is truely safe in such a
> situation is unknown - we try to ensure that it won't do anything
> stupid, but proving that is a hard task, and we've recently had
> issues with the unwinder even without that.
> 
> So, allowing this feels like we're opening the door to DoS attacks
> from userspace, where userspace sits there reading /proc/*/stack of
> some thread running on a different CPU waiting for the kernel to
> oops itself, possibly holding a lock, resulting in the system
> dying.
> 
> These decisions need to be made by architecture code not generic
> code, particularly where the method of unwinding is architecture
> specific and thus may have criteria defining when its safe to do so.
> 

Thank you for your comments, I'll remove the patch and keep the check.

Thanks,
Huafei