lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrWLbw=m8cjExqsHXkKC8Nm6w2-wXp-Ohnb5bbQYDODmtQ@mail.gmail.com>
Date:	Thu, 25 Dec 2014 08:21:47 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	秦承刚(承刚) 
	<chenggang.qcg@...baba-inc.com>
Cc:	root <chenggang.qin@...il.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	秦承刚(承刚) <chenggang.qcg@...bao.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	David Ahern <dsahern@...il.com>,
	Ingo Molnar <mingo@...hat.com>, Mike Galbraith <efault@....de>,
	Namhyung Kim <namhyung@...il.com>,
	Paul Mackerras <paulus@...ba.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Wu Fengguang <fengguang.wu@...el.com>,
	Yanmin Zhang <yanmin.zhang@...el.com>
Subject: Re: 答复:[PATCH] perf core: Use KSTK_ESP() instead of pt_regs->sp while output user regs

On Thu, Dec 25, 2014 at 7:48 AM, Andy Lutomirski <luto@...capital.net> wrote:
> On Thu, Dec 25, 2014 at 4:13 AM, 秦承刚(承刚) <chenggang.qcg@...baba-inc.com> wrote:
>> The context is NMI (PMU) or IRQ (hrtimer). It is a bit complex. The process
>> we want to sample is the current, so it is always running.
>> We need to distinguish between IRQ context, syscall or user context that are
>> interrupted by NMI.
>
> Oh.  So you really are trying to get the user regs from NMI context
> even if you interrupted the kernel.  This will be an unpleasant thing
> to do correctly.
>
>> Syscall: sp = current->thread.usersp;
>
> Nope.  For x86_64 at least, sp is in old_rsp, not usersp, *if* the
> syscall you interrupted was syscall64 instead of one of the ia32
> entries, which are differently strange.  If you've context switched
> out and back, then usersp will match it.  If not, then you're looking
> at some random stale sp value.
>
> Keep in mind that there is absolutely no guarantee that TIF_IA32
> matches the syscall entry type.
>
>>              old_rsp always point to current->thread.usersp. May be we
>> shouldn't use current_user_stack_pointer();
>> User: sp = task_pt_regs(task)->sp;
>>           current's pt_regs are stored in kernel stack while NMI or IRQ
>> occured.
>
> This is the only easy case.
>
>> IRQ: sp = task_pt_regs(task)->sp;
>>           current's pt_regs are stored in kernel stack while IRQ which was
>> interrupted occured.
>
> Sort of.  It's true by the time you actually execute the IRQ handler.
>
> I think that trying to do this is doomed to either failure or extreme
> complexity.  You're in an NMI, so you could be part-way through a
> context switch or you could be in the very first instruction of the
> syscall handler.
>
> On a quick look, there are plenty of other bugs in there besides just
> the stack pointer issue.  The ABI check that uses TIF_IA32 in the perf
> core is completely wrong.  TIF_IA32 may be equal to the actual
> userspace bitness by luck, but, if so, that's more or less just luck.
> And there's a user_mode test that should be user_mode_vm.
>
> Also, it's not just sp that's wrong.  There are various places that
> you can interrupt in which many of the registers have confusing
> locations.  You could try using the cfi unwind data, but that's
> unlikely to work for regs like cs and ss, and, during context switch,
> this has very little chance of working.

Even the unwinder won't be able to get rbx, rbp, r12, r13, r14, and
r15 right -- good luck handling FORK_LIKE, PTREGSCALL, etc.

--Andy

>
> What's the point of this feature?  Honestly, my suggestion would be to
> delete it instead of trying to fix it.  It's also not clear to me that
> there aren't serious security problems here -- it's entirely possible
> for sensitive *kernel* values to and up in task_pt_regs at certain
> times, and if you run during context switch and there's no code to
> suppress this dump during context switch, then you could be showing
> regs that belong to the wrong task.
>
> --Andy
>
>>
>> Regards
>> Chenggang
>>
>> ------------------------------------------------------------------
>> 发件人:Andy Lutomirski <luto@...capital.net>
>> 发送时间:2014年12月23日(星期二) 16:30
>> 收件人:root <chenggang.qin@...il.com>,linux-kernel
>> <linux-kernel@...r.kernel.org>
>> 抄 送:秦承刚(承刚) <chenggang.qcg@...bao.com>,Andrew Morton
>> <akpm@...ux-foundation.org>,Arjan van de Ven <arjan@...ux.intel.com>,David
>> Ahern <dsahern@...il.com>,Ingo Molnar <mingo@...hat.com>,Mike Galbraith
>> <efault@....de>,Namhyung Kim <namhyung@...il.com>,Paul Mackerras
>> <paulus@...ba.org>,Peter Zijlstra <a.p.zijlstra@...llo.nl>,Wu Fengguang
>> <fengguang.wu@...el.com>,Yanmin Zhang <yanmin.zhang@...el.com>
>> 主 题:Re: [PATCH] perf core: Use KSTK_ESP() instead of pt_regs->sp while
>> output user regs
>>
>> On 12/22/2014 10:22 PM, root wrote:
>>> From: Chenggang Qin <chenggang.qcg@...bao.com>
>>>
>>> For x86_64, the exact value of user stack's esp should be got by
>>> KSTK_ESP(current).
>>> current->thread.usersp is copied from PDA while enter ring0.
>>> Now, we output the value of sp from pt_regs. But pt_regs->sp has changed
>>> before
>>> it was pushed into kernel stack.
>>>
>>> So, we cannot get the correct callchain while unwind some user stacks.
>>> For example, if the stack contains __lll_unlock_wake()/__lll_lock_wait(),
>>> the
>>> callchain will break some times with the latest version of libunwind.
>>> The root cause is the sp that is used by libunwind may be wrong.
>>>
>>> If we use KSTK_ESP(current), the correct callchain can be got everytime.
>>> Other architectures also have KSTK_ESP() macro.
>>>
>>> Signed-off-by: Chenggang Qin <chenggang.qcg@...bao.com>
>>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>>> Cc: Arjan van de Ven <arjan@...ux.intel.com>
>>> Cc: David Ahern <dsahern@...il.com>
>>> Cc: Ingo Molnar <mingo@...hat.com>
>>> Cc: Mike Galbraith <efault@....de>
>>> Cc: Namhyung Kim <namhyung@...il.com>
>>> Cc: Paul Mackerras <paulus@...ba.org>
>>> Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
>>> Cc: Wu Fengguang <fengguang.wu@...el.com>
>>> Cc: Yanmin Zhang <yanmin.zhang@...el.com>
>>>
>>> ---
>>> arch/x86/kernel/perf_regs.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
>>> index e309cc5..5da8df8 100644
>>> --- a/arch/x86/kernel/perf_regs.c
>>> +++ b/arch/x86/kernel/perf_regs.c
>>> @@ -60,6 +60,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>>> if (WARN_ON_ONCE(idx >= ARRAY_SIZE(pt_regs_offset)))
>>> return 0;
>>>
>>> + if (idx == PERF_REG_X86_SP)
>>> + return KSTK_ESP(current);
>>> +
>>
>> This patch is probably fine, but KSTK_ESP seems to be bogus:
>>
>> unsigned long KSTK_ESP(struct task_struct *task)
>> {
>> return (test_tsk_thread_flag(task, TIF_IA32)) ?
>> (task_pt_regs(task)->sp) : ((task)->thread.usersp);
>> }
>>
>> I swear that every time I've looked at anything that references TIF_IA32
>> in the last two weeks, it's been wrong. This should be something like:
>>
>> if (task_thread_info(task)->status & TS_COMPAT)
>> return task_pt_regs(task)->sp;
>> else if (task == current && task is in a syscall)
>> return current_user_stack_pointer();
>> else if (task is not running && task is in a syscall)
>> return task->thread.usersp;
>> else if (task is not in a syscall)
>> return task_pt_regs(task)->sp;
>> else
>> we're confused; give up.
>>
>> What context are you using KSTK_ESP in?
>>
>> --Andy
>>
>>> return regs_get_register(regs, pt_regs_offset[idx]);
>>> }
>>>
>>>
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ