lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <efaf82d9-8f66-7e0e-0cac-cee38450a224@linux.microsoft.com>
Date:   Tue, 23 Mar 2021 10:22:39 -0500
From:   "Madhavan T. Venkataraman" <madvenka@...ux.microsoft.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     broonie@...nel.org, jpoimboe@...hat.com, jthierry@...hat.com,
        catalin.marinas@....com, will@...nel.org,
        linux-arm-kernel@...ts.infradead.org,
        live-patching@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark
 a stack trace unreliable



On 3/23/21 9:33 AM, Mark Rutland wrote:
> On Tue, Mar 23, 2021 at 08:31:50AM -0500, Madhavan T. Venkataraman wrote:
>> On 3/23/21 8:04 AM, Mark Rutland wrote:
>>> On Tue, Mar 23, 2021 at 07:46:10AM -0500, Madhavan T. Venkataraman wrote:
>>>> On 3/23/21 5:42 AM, Mark Rutland wrote:
>>>>> On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@...ux.microsoft.com wrote:
>>>>>> From: "Madhavan T. Venkataraman" <madvenka@...ux.microsoft.com>
>>>>>>
>>>>>> EL1 exceptions can happen on any instruction including instructions in
>>>>>> the frame pointer prolog or epilog. Depending on where exactly they happen,
>>>>>> they could render the stack trace unreliable.
>>>>>>
>>>>>> If an EL1 exception frame is found on the stack, mark the stack trace as
>>>>>> unreliable.
>>>>>>
>>>>>> Now, the EL1 exception frame is not at any well-known offset on the stack.
>>>>>> It can be anywhere on the stack. In order to properly detect an EL1
>>>>>> exception frame the following checks must be done:
>>>>>>
>>>>>> 	- The frame type must be EL1_FRAME.
>>>>>>
>>>>>> 	- When the register state is saved in the EL1 pt_regs, the frame
>>>>>> 	  pointer x29 is saved in pt_regs->regs[29] and the return PC
>>>>>> 	  is saved in pt_regs->pc. These must match with the current
>>>>>> 	  frame.
>>>>>
>>>>> Before you can do this, you need to reliably identify that you have a
>>>>> pt_regs on the stack, but this patch uses a heuristic, which is not
>>>>> reliable.
>>>>>
>>>>> However, instead you can identify whether you're trying to unwind
>>>>> through one of the EL1 entry functions, which tells you the same thing
>>>>> without even having to look at the pt_regs.
>>>>>
>>>>> We can do that based on the entry functions all being in .entry.text,
>>>>> which we could further sub-divide to split the EL0 and EL1 entry
>>>>> functions.
>>>>
>>>> Yes. I will check the entry functions. But I still think that we should
>>>> not rely on just one check. The additional checks will make it robust.
>>>> So, I suggest that the return address be checked first. If that passes,
>>>> then we can be reasonably sure that there are pt_regs. Then, check
>>>> the other things in pt_regs.
>>>
>>> What do you think this will catch?
>>
>> I am not sure that I have an exact example to mention here. But I will attempt
>> one. Let us say that a task has called arch_stack_walk() in the recent past.
>> The unwinder may have copied a stack frame onto some location in the stack
>> with one of the return addresses we check. Let us assume that there is some
>> stack corruption that makes a frame pointer point to that exact record. Now,
>> we will get a match on the return address on the next unwind.
> 
> I don't see how this is material to the pt_regs case, as either:
> 
> * When the unwinder considers this frame, it appears to be in the middle
>   of an EL1 entry function, and the unwinder must mark the unwinding as
>   unreliable regardless of the contents of any regs (so there's no need
>   to look at the regs).
> 
> * When the unwinder considers this frame, it does not appear to be in
>   the middle of an EL1 entry function, so the unwinder does not think
>   there are any regs to consider, and so we cannot detect this case.
> 
> ... unless I've misunderstood the example?
> 
> There's a general problem that it's possible to corrupt any portion of
> the chain to skip records, e.g.
> 
>   A -> B -> C -> D -> E -> F -> G -> H -> [final]
> 
> ... could get corrupted to:
> 
>   A -> B -> D -> H -> [final]
> 
> ... regardless of whether C/E/F/G had associated pt_regs. AFAICT there's
> no good way to catch this generally unless we have additional metadata
> to check the unwinding against.
> 
> The likelihood of this happening without triggering other checks is
> vanishingly low, and as we don't have a reliable mechanism for detecting
> this, I don't think it's worthwhile attempting to do so.
> 
> If and when we try to unwind across EL1 exception boundaries, the
> potential mismatch between the frame record and regs will be more
> significant, and I agree at that point thisd will need more thought.
> 
>> Pardon me if the example is somewhat crude. My point is that it is
>> highly unlikely but not impossible for the return address to be on the
>> stack and for the unwinder to get an unfortunate match.
> 
> I agree that this is possible in theory, but as above I don't think this
> is a practical concern.
> 

OK. What you say makes sense.

Thanks.

Madhavan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ