linux-kernel - Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8c582e45-0954-a2ea-764a-4dd78a464988@huawei.com>
Date:   Wed, 16 Feb 2022 11:46:54 +0000
From:   John Garry <john.garry@...wei.com>
To:     Will Deacon <will@...nel.org>
CC:     Leo Yan <leo.yan@...aro.org>, Marco Elver <elver@...gle.com>,
        "Thomas Richter" <tmricht@...ux.ibm.com>,
        <linux-kernel@...r.kernel.org>, <linux-perf-users@...r.kernel.org>,
        <acme@...nel.org>, <svens@...ux.ibm.com>, <gor@...ux.ibm.com>,
        <sumanthk@...ux.ibm.com>, <hca@...ux.ibm.com>,
        "Mark Rutland" <mark.rutland@....com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>, <dvyukov@...gle.com>
Subject: Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test
 73 Sig_trap fails on s390)

Hi Will,

> Sorry, I haven't had time to look at this (or the thousands of other mails
> in my inbox) lately.
> 

Thanks

> I don't recall all of the details, but basically hw_breakpoint really
> doesn't work well on arm/arm64 -- the sticking points are around handling
> the stepping and whether to step into or over exceptions. Sadly, our ptrace
> interface (which is what is used by GDB) is built on top of hw_breakpoint,
> so we can't just rip it out and any significant changes are pretty risky.
> 
> What I would like to happen is that we rework our debug exception handling
> as outlined by [1] so that kernel debug is better defined and the ptrace
> interface can interact directly with the debug architecture instead of being
> funnelled through hw_breakpoint. Once we have that, I think we could try to
> improve hw_breakpoint much more comfortably (or at least defeature it
> considerably without having to worry about breaking GDB). I started this a
> couple of years ago, but I haven't found time to get back to it for ages.
> 
> Anyway, to this specific test...
> 
> When we hit a break/watchpoint the faulting PC points at the instruction
> which faulted and the exception is reported before the instruction has had
> any other side-effects (e.g. if a watchpoint triggers on a store, then
> memory will not have been updated when the watchpoint handler runs), so if
> we were to return as usual after reporting the exception to perf then we
> would just hit the same break/watchpoint again and we'd get stuck. GDB
> handles stepping over the faulting instruction, but for perf (and assumedly
> these tests), the kernel is expected to handle the step. This handling
> amounts to disabling the break/watchpoint which we think we hit and then
> attempting a hardware single-step. During the step we could run into more
> break/watchpoints on the same instruction, so we'll keep disabling things
> until we eventually manage to complete the step, which is signalled by a
> specific type of debug exception. At this point, we re-enable the
> break/watchpoints and we're good.
> 
> Signals make this messy, as the step logic will step_into_  the signal
> handler -- we have to do this, otherwise we would miss break/watchpoints
> triggered by the signal handler if we had disabled them for the step.
> However, it means that when we return back from the signal handler we will
> run back into the break/watchpoint which we initially stepped over. When
> perf uses SIGTRAP to notify userspace that we hit a break/watchpoint,
> then we'll get stuck because we'll step into the handler every time.
> 
> Hopefully that clears things up a bit. Ideally, the kernel wouldn't
> pretend to handle this stepping at all for arm64 as it adds a bunch of
> complexity, overhead to our context-switch and I don't think the current
> behaviour is particularly useful.
> 

Right, so what I am hearing altogether is that for now we should just 
skip this test.

And since the kernel does not seem to advertise this capability we need 
to disable for specific architectures.

Thanks,
John

> [1]https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
> .