[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d1605c5a-1dfb-65b6-f79f-9c92bb507d4b@huawei.com>
Date: Tue, 26 Jul 2022 20:08:14 +0800
From: Li Huafei <lihuafei1@...wei.com>
To: "Russell King (Oracle)" <linux@...linux.org.uk>
CC: Linus Walleij <linus.walleij@...aro.org>, <ardb@...nel.org>,
<will@...nel.org>, <mark.rutland@....com>, <broonie@...nel.org>,
<peterz@...radead.org>, <mingo@...hat.com>, <acme@...nel.org>,
<alexander.shishkin@...ux.intel.com>, <jolsa@...nel.org>,
<namhyung@...nel.org>, <arnd@...db.de>, <rostedt@...dmis.org>,
<nick.hawkins@....com>, <john@...ozen.org>, <mhiramat@...nel.org>,
<ast@...nel.org>, <linyujun809@...wei.com>,
<ndesaulniers@...gle.com>, <linux-arm-kernel@...ts.infradead.org>,
<linux-kernel@...r.kernel.org>, <linux-perf-users@...r.kernel.org>
Subject: Re: [PATCH 3/5] ARM: stacktrace: Allow stack trace saving for
non-current tasks
On 2022/7/26 17:49, Russell King (Oracle) wrote:
> On Tue, Jul 26, 2022 at 05:12:39PM +0800, Li Huafei wrote:
>>
>>
>> On 2022/7/18 17:07, Linus Walleij wrote:
>>> On Tue, Jul 12, 2022 at 4:18 AM Li Huafei <lihuafei1@...wei.com> wrote:
>>>
>>>> The current ARM implementation of save_stack_trace_tsk() does not allow
>>>> saving stack trace for non-current tasks, which may limit the scenarios
>>>> in which stack_trace_save_tsk() can be used. Like other architectures,
>>>> or like ARM's unwind_backtrace(), we can leave it up to the caller to
>>>> ensure that the task that needs to be unwound is not running.
>>>>
>>>> Signed-off-by: Li Huafei <lihuafei1@...wei.com>
>>>
>>> That sounds good, but:
>>>
>>>> if (tsk != current) {
>>>> -#ifdef CONFIG_SMP
>>>> - /*
>>>> - * What guarantees do we have here that 'tsk' is not
>>>> - * running on another CPU? For now, ignore it as we
>>>> - * can't guarantee we won't explode.
>>>> - */
>>>> - return;
>>>> -#else
>>>> + /* task blocked in __switch_to */
>>>
>>> The commit text is not consistent with the comment you are removing.
>>>
>>> The commit is talking about "non-current" tasks which is one thing,
>>> but the code is avoiding any tasks under SMP because they may be
>>> running on another CPU. So you need to update the commit
>>> message to say something like "non-current or running on another CPU".
>>>
>>> If this condition will be checked at call sites in following patches,
>>> then mention
>>> that in the commit as well, so we know the end result is that we do
>>> not break it,
>>
>> The generic code stack_trace_save_tsk() does not have this check, and by
>> 'caller' I mean the caller of stack_trace_save_tsk(), expecting the
>> 'caller' to ensure that the task is not running. So in effect this check
>> has been dropped and there is no more guarantee. Sorry for not
>> clarifying the change here.
>
> Can you prove in every case that the thread we're being asked to unwind
> is not running? I don't think you can.
>
> There are things like proc_pid_stack() in procfs and the stack traces
> in sysrq-t which have attempted to unwind everything whether it's
> running or not.
>
> So no, there is no guarantee that the thread is blocked in
> __switch_to().
>
Yes, I agree.
>> But can we assume that the user should know that the stacktrace is
>> unreliable for a task that is running on another CPU? If not, I should
>> remove this patch and keep the check.
>
> It's not about "unreliable" stack traces, it's about the unwinder
> killing the kernel.
>
> The hint is this:
>
> frame.fp = thread_saved_fp(tsk);
> frame.sp = thread_saved_sp(tsk);
> frame.lr = 0; /* recovered from the stack */
> frame.pc = thread_saved_pc(tsk);
>
> These access the context saved by the scheduler when the task is
> sleeping. When the thread is running, these saved values will be
> the state when the thread last slept. However, with the thread
> running, the stack could now contain any data what so ever, and
> could change at any moment.
>
I get it. For example, since the data on the stack is changing,
'*(unsigned long *)fp' could access any illegal address and crash the
kernel.
> Whether the unwind-table unwinder is truely safe in such a
> situation is unknown - we try to ensure that it won't do anything
> stupid, but proving that is a hard task, and we've recently had
> issues with the unwinder even without that.
>
> So, allowing this feels like we're opening the door to DoS attacks
> from userspace, where userspace sits there reading /proc/*/stack of
> some thread running on a different CPU waiting for the kernel to
> oops itself, possibly holding a lock, resulting in the system
> dying.
>
> These decisions need to be made by architecture code not generic
> code, particularly where the method of unwinding is architecture
> specific and thus may have criteria defining when its safe to do so.
>
Thank you for your comments, I'll remove the patch and keep the check.
Thanks,
Huafei
Powered by blists - more mailing lists