[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57969234.1070201@linaro.org>
Date: Mon, 25 Jul 2016 18:27:00 -0400
From: David Long <dave.long@...aro.org>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Mark Rutland <mark.rutland@....com>,
Petr Mladek <pmladek@...e.com>,
Zi Shen Lim <zlim.lnx@...il.com>,
Will Deacon <will.deacon@....com>,
Andrey Ryabinin <ryabinin.a.a@...il.com>,
yalin wang <yalin.wang2010@...il.com>,
Li Bin <huawei.libin@...wei.com>,
John Blackwood <john.blackwood@...r.com>,
Pratyush Anand <panand@...hat.com>,
Daniel Thompson <daniel.thompson@...aro.org>,
Huang Shijie <shijie.huang@....com>,
Dave P Martin <Dave.Martin@....com>,
Jisheng Zhang <jszhang@...vell.com>,
Vladimir Murzin <Vladimir.Murzin@....com>,
Steve Capper <steve.capper@...aro.org>,
Suzuki K Poulose <suzuki.poulose@....com>,
Marc Zyngier <marc.zyngier@....com>,
Yang Shi <yang.shi@...aro.org>,
Mark Brown <broonie@...nel.org>,
Sandeepa Prabhu <sandeepa.s.prabhu@...il.com>,
William Cohen <wcohen@...hat.com>,
Alex Bennée <alex.bennee@...aro.org>,
Adam Buchbinder <adam.buchbinder@...il.com>,
linux-arm-kernel@...ts.infradead.org,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
linux-kernel@...r.kernel.org, James Morse <james.morse@....com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Robin Murphy <robin.murphy@....com>,
Jens Wiklander <jens.wiklander@...aro.org>,
Christoffer Dall <christoffer.dall@...aro.org>
Subject: Re: [PATCH v15 04/10] arm64: Kprobes with single stepping support
On 07/25/2016 01:13 PM, Catalin Marinas wrote:
> On Fri, Jul 22, 2016 at 11:51:32AM -0400, David Long wrote:
>> On 07/22/2016 06:16 AM, Catalin Marinas wrote:
>>> On Thu, Jul 21, 2016 at 02:33:52PM -0400, David Long wrote:
>>>> On 07/21/2016 01:23 PM, Marc Zyngier wrote:
>>>>> On 21/07/16 17:33, David Long wrote:
>>>>>> On 07/20/2016 12:09 PM, Marc Zyngier wrote:
>>>>>>> On 08/07/16 17:35, David Long wrote:
>>>>>>>> +#define MAX_INSN_SIZE 1
>>>>>>>> +#define MAX_STACK_SIZE 128
>>>>>>>
>>>>>>> Where is that value coming from? Because even on my 6502, I have a 256
>>>>>>> byte stack.
>>>>>>>
>>>>>>
>>>>>> Although I don't claim to know the original author's thoughts I would
>>>>>> guess it is based on the seven other existing implementations for
>>>>>> kprobes on various architectures, all of which appear to use either 64
>>>>>> or 128 for MAX_STACK_SIZE. The code is not trying to duplicate the
>>>>>> whole stack.
>>> [...]
>>>>> My main worry is that whatever value you pick, it is always going to be
>>>>> wrong. This is used to preserve arguments that are passed on the stack,
>>>>> as opposed to passed by registers). We have no idea of what is getting
>>>>> passed there so saving nothing, 128 bytes or 2kB is about the same. It
>>>>> is always wrong.
>>>>>
>>>>> A much better solution would be to check the frame pointer, and copy the
>>>>> delta between FP and SP, assuming it fits inside the allocated buffer.
>>>>> If it doesn't, or if FP is invalid, we just skip the hook, because we
>>>>> can't reliably execute it.
>>>>
>>>> Well, this is the way it works literally everywhere else. It is a documented
>>>> limitation (Documentation/kprobes.txt). Said documentation may need to be
>>>> changed along with the suggested fix.
>>>
>>> The document states: "Up to MAX_STACK_SIZE bytes are copied". That means
>>> the arch code could always copy less but never more than MAX_STACK_SIZE.
>>> What we are proposing is that we should try to guess how much to copy
>>> based on the FP value (caller's frame) and, if larger than
>>> MAX_STACK_SIZE, skip the probe hook entirely. I don't think this goes
>>> against the kprobes.txt document but at least it (a) may improve the
>>> performance slightly by avoiding unnecessary copy and (b) it avoids
>>> undefined behaviour if we ever encounter a jprobe with arguments passed
>>> on the stack beyond MAX_STACK_SIZE.
>>
>> OK, it sounds like an improvement. I do worry a little about unexpected side
>> effects.
>
> You get more unexpected side effects by not saving/restoring the whole
> stack. We looked into this on Friday and came to the conclusion that
> there is no safe way for kprobes to know which arguments passed on the
> stack should be preserved, at least not with the current API.
>
> Basically the AArch64 PCS states that for arguments passed on the stack
> (e.g. they can't fit in registers), the caller allocates memory for them
> (on its own stack) and passes the pointer to the callee. Unfortunately,
> the frame pointer seems to be decremented correspondingly to cover the
> arguments, so we don't really have a way to tell how much to copy.
> Copying just the caller's stack frame isn't safe either since a
> callee/caller receiving such argument on the stack may passed it down to
> a callee without copying (I couldn't find anything in the PCS stating
> that this isn't allowed).
OK, so I think we're pretty much back to our starting point.
>
>> I'm just asking if we can accept the existing code as now complete
>> enough (in that I believe it matches the other implementations) and make
>> this enhancement something for the next release cycle, allowing the existing
>> code to be exercised by a wider audience and providing ample time to test
>> the new modification? I'd hate to get stuck in a mode where this patch gets
>> repeatedly delayed for changes that go above and beyond the original design.
>
> The problem is that the original design was done on x86 for its PCS and
> it doesn't always fit other architectures. So we could either ignore the
> problem, hoping that no probed function requires argument passing on
> stack or we copy all the valid data on the kernel stack:
>
> diff --git a/arch/arm64/include/asm/kprobes.h b/arch/arm64/include/asm/kprobes.h
> index 61b49150dfa3..157fd0d0aa08 100644
> --- a/arch/arm64/include/asm/kprobes.h
> +++ b/arch/arm64/include/asm/kprobes.h
> @@ -22,7 +22,7 @@
>
> #define __ARCH_WANT_KPROBES_INSN_SLOT
> #define MAX_INSN_SIZE 1
> -#define MAX_STACK_SIZE 128
> +#define MAX_STACK_SIZE THREAD_SIZE
>
> #define flush_insn_slot(p) do { } while (0)
> #define kretprobe_blacklist_size 0
>
I doubt the ARM PCS is unusual. At any rate I'm certain there are other
architectures that pass aggregate parameters on the stack. I suspect
other RISC(-ish) architectures have similar PCS issues and I think this
is at least a big part of where this simple copy with a 64/128 limit
comes from, or at least why it continues to exist. That said, I'm not
enthusiastic about researching that assertion in detail as it could be
time consuming.
I think this (unchecked) limitation for stack frames is something users
of jprobes understand, or at least should understand from the
documentation. At any rate it doesn't sound like we have a way of
improving it, and I think that's OK.
-dl
Powered by blists - more mailing lists