lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 25 Jul 2016 18:27:00 -0400
From:	David Long <dave.long@...aro.org>
To:	Catalin Marinas <catalin.marinas@....com>
Cc:	Mark Rutland <mark.rutland@....com>,
	Petr Mladek <pmladek@...e.com>,
	Zi Shen Lim <zlim.lnx@...il.com>,
	Will Deacon <will.deacon@....com>,
	Andrey Ryabinin <ryabinin.a.a@...il.com>,
	yalin wang <yalin.wang2010@...il.com>,
	Li Bin <huawei.libin@...wei.com>,
	John Blackwood <john.blackwood@...r.com>,
	Pratyush Anand <panand@...hat.com>,
	Daniel Thompson <daniel.thompson@...aro.org>,
	Huang Shijie <shijie.huang@....com>,
	Dave P Martin <Dave.Martin@....com>,
	Jisheng Zhang <jszhang@...vell.com>,
	Vladimir Murzin <Vladimir.Murzin@....com>,
	Steve Capper <steve.capper@...aro.org>,
	Suzuki K Poulose <suzuki.poulose@....com>,
	Marc Zyngier <marc.zyngier@....com>,
	Yang Shi <yang.shi@...aro.org>,
	Mark Brown <broonie@...nel.org>,
	Sandeepa Prabhu <sandeepa.s.prabhu@...il.com>,
	William Cohen <wcohen@...hat.com>,
	Alex Bennée <alex.bennee@...aro.org>,
	Adam Buchbinder <adam.buchbinder@...il.com>,
	linux-arm-kernel@...ts.infradead.org,
	Ard Biesheuvel <ard.biesheuvel@...aro.org>,
	linux-kernel@...r.kernel.org, James Morse <james.morse@....com>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Robin Murphy <robin.murphy@....com>,
	Jens Wiklander <jens.wiklander@...aro.org>,
	Christoffer Dall <christoffer.dall@...aro.org>
Subject: Re: [PATCH v15 04/10] arm64: Kprobes with single stepping support

On 07/25/2016 01:13 PM, Catalin Marinas wrote:
> On Fri, Jul 22, 2016 at 11:51:32AM -0400, David Long wrote:
>> On 07/22/2016 06:16 AM, Catalin Marinas wrote:
>>> On Thu, Jul 21, 2016 at 02:33:52PM -0400, David Long wrote:
>>>> On 07/21/2016 01:23 PM, Marc Zyngier wrote:
>>>>> On 21/07/16 17:33, David Long wrote:
>>>>>> On 07/20/2016 12:09 PM, Marc Zyngier wrote:
>>>>>>> On 08/07/16 17:35, David Long wrote:
>>>>>>>> +#define MAX_INSN_SIZE			1
>>>>>>>> +#define MAX_STACK_SIZE			128
>>>>>>>
>>>>>>> Where is that value coming from? Because even on my 6502, I have a 256
>>>>>>> byte stack.
>>>>>>>
>>>>>>
>>>>>> Although I don't claim to know the original author's thoughts I would
>>>>>> guess it is based on the seven other existing implementations for
>>>>>> kprobes on various architectures, all of which appear to use either 64
>>>>>> or 128 for MAX_STACK_SIZE.  The code is not trying to duplicate the
>>>>>> whole stack.
>>> [...]
>>>>> My main worry is that whatever value you pick, it is always going to be
>>>>> wrong. This is used to preserve arguments that are passed on the stack,
>>>>> as opposed to passed by registers). We have no idea of what is getting
>>>>> passed there so saving nothing, 128 bytes or 2kB is about the same. It
>>>>> is always wrong.
>>>>>
>>>>> A much better solution would be to check the frame pointer, and copy the
>>>>> delta between FP and SP, assuming it fits inside the allocated buffer.
>>>>> If it doesn't, or if FP is invalid, we just skip the hook, because we
>>>>> can't reliably execute it.
>>>>
>>>> Well, this is the way it works literally everywhere else. It is a documented
>>>> limitation (Documentation/kprobes.txt). Said documentation may need to be
>>>> changed along with the suggested fix.
>>>
>>> The document states: "Up to MAX_STACK_SIZE bytes are copied". That means
>>> the arch code could always copy less but never more than MAX_STACK_SIZE.
>>> What we are proposing is that we should try to guess how much to copy
>>> based on the FP value (caller's frame) and, if larger than
>>> MAX_STACK_SIZE, skip the probe hook entirely. I don't think this goes
>>> against the kprobes.txt document but at least it (a) may improve the
>>> performance slightly by avoiding unnecessary copy and (b) it avoids
>>> undefined behaviour if we ever encounter a jprobe with arguments passed
>>> on the stack beyond MAX_STACK_SIZE.
>>
>> OK, it sounds like an improvement. I do worry a little about unexpected side
>> effects.
>
> You get more unexpected side effects by not saving/restoring the whole
> stack. We looked into this on Friday and came to the conclusion that
> there is no safe way for kprobes to know which arguments passed on the
> stack should be preserved, at least not with the current API.
>
> Basically the AArch64 PCS states that for arguments passed on the stack
> (e.g. they can't fit in registers), the caller allocates memory for them
> (on its own stack) and passes the pointer to the callee. Unfortunately,
> the frame pointer seems to be decremented correspondingly to cover the
> arguments, so we don't really have a way to tell how much to copy.
> Copying just the caller's stack frame isn't safe either since a
> callee/caller receiving such argument on the stack may passed it down to
> a callee without copying (I couldn't find anything in the PCS stating
> that this isn't allowed).

OK, so I think we're pretty much back to our starting point.
>
>> I'm just asking if we can accept the existing code as now complete
>> enough (in that I believe it matches the other implementations) and make
>> this enhancement something for the next release cycle, allowing the existing
>> code to be exercised by a wider audience and providing ample time to test
>> the new modification? I'd hate to get stuck in a mode where this patch gets
>> repeatedly delayed for changes that go above and beyond the original design.
>
> The problem is that the original design was done on x86 for its PCS and
> it doesn't always fit other architectures. So we could either ignore the
> problem, hoping that no probed function requires argument passing on
> stack or we copy all the valid data on the kernel stack:
>
> diff --git a/arch/arm64/include/asm/kprobes.h b/arch/arm64/include/asm/kprobes.h
> index 61b49150dfa3..157fd0d0aa08 100644
> --- a/arch/arm64/include/asm/kprobes.h
> +++ b/arch/arm64/include/asm/kprobes.h
> @@ -22,7 +22,7 @@
>
>   #define __ARCH_WANT_KPROBES_INSN_SLOT
>   #define MAX_INSN_SIZE			1
> -#define MAX_STACK_SIZE			128
> +#define MAX_STACK_SIZE			THREAD_SIZE
>
>   #define flush_insn_slot(p)		do { } while (0)
>   #define kretprobe_blacklist_size	0
>

I doubt the ARM PCS is unusual.  At any rate I'm certain there are other 
architectures that pass aggregate parameters on the stack. I suspect 
other RISC(-ish) architectures have similar PCS issues and I think this 
is at least a big part of where this simple copy with a 64/128 limit 
comes from, or at least why it continues to exist.  That said, I'm not 
enthusiastic about researching that assertion in detail as it could be 
time consuming.

I think this (unchecked) limitation for stack frames is something users 
of jprobes understand, or at least should understand from the 
documentation.  At any rate it doesn't sound like we have a way of 
improving it, and I think that's OK.

-dl

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ