[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250520123219.GA18565@willie-the-truck>
Date: Tue, 20 May 2025 13:32:19 +0100
From: Will Deacon <will@...nel.org>
To: Mark Rutland <mark.rutland@....com>
Cc: Nam Cao <namcao@...utronix.de>, Steven Rostedt <rostedt@...dmis.org>,
Gabriele Monaco <gmonaco@...hat.com>,
linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
john.ogness@...utronix.de,
Catalin Marinas <catalin.marinas@....com>,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v6 17/22] arm64: mm: Add page fault trace points
On Mon, May 19, 2025 at 05:17:02PM +0100, Mark Rutland wrote:
> On Fri, May 16, 2025 at 03:04:50PM +0100, Will Deacon wrote:
> > On Wed, Apr 30, 2025 at 01:02:32PM +0200, Nam Cao wrote:
> > > Add page fault trace points, which are useful to implement RV monitor which
> > > watches page faults.
> > >
> > > Signed-off-by: Nam Cao <namcao@...utronix.de>
> > > ---
> > > Cc: Catalin Marinas <catalin.marinas@....com>
> > > Cc: Will Deacon <will@...nel.org>
> > > Cc: linux-arm-kernel@...ts.infradead.org
> > > ---
> > > arch/arm64/mm/fault.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > > index ef63651099a9..e3f096b0dffd 100644
> > > --- a/arch/arm64/mm/fault.c
> > > +++ b/arch/arm64/mm/fault.c
> > > @@ -44,6 +44,9 @@
> > > #include <asm/tlbflush.h>
> > > #include <asm/traps.h>
> > >
> > > +#define CREATE_TRACE_POINTS
> > > +#include <trace/events/exceptions.h>
> > > +
> > > struct fault_info {
> > > int (*fn)(unsigned long far, unsigned long esr,
> > > struct pt_regs *regs);
> > > @@ -559,6 +562,11 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
> > > if (kprobe_page_fault(regs, esr))
> > > return 0;
> > >
> > > + if (user_mode(regs))
> > > + trace_page_fault_user(addr, regs, esr);
> > > + else
> > > + trace_page_fault_kernel(addr, regs, esr);
> >
> > Why is this after kprobe_page_fault()?
>
> The kprobe_page_fault() gunk is doing something quite different, and is
> poorly named. That's trying to fixup the PC (and some other state) to
> hide kprobe details from the fault handling logic when an out-of-line
> copy of an instruction somehow triggers a fault.
>
> Logically, that *should* happen before the tracepoints, and shouldn't be
> moved later. For other reasons it needs to be even earlier in the fault
> handling flow, and is currently far too late, but that only ends up
> mattering int he presence of other kernel bugs. For now I think it
> should stay where it is.
I thought these tracepoints were intended to be used by RV, in which
case I'd have thought we'd want as much coverage as possible to reason
about what the kernel is actually doing.
> More details below, for the curious and/or deranged.
>
> The kprobe_page_fault() gunk is trying to fix up the case where an
> instruction has been kprobed, an out-of-line copy of that instruction is
> being stepped, and the out-of-line instruction has triggered a fault.
> When that happens, kprobe_page_fault() tries to reset the faulting PC
> and DAIF such that it looks like the fault was taken from the original
> PC of the probed instruction.
>
> The real logic for that happens in kprobe_fault_handler(), which adjusts
> the values in pt_regs, but does not handle the live DAIF value. It also
> doesn't handle the PMR when pNMI is in use. Due to this, the fault
> handler can run with DAIF bits masked unexpectedly, and a subsequent
> exception return *could* go wrong.
>
> Luckily all code with an extable entry has been blacklisted for kprobes
> since commit:
>
> 888b3c8720e0a403 ("arm64: Treat all entry code as non-kprobe-able")
>
> ... so we should only get here if there's another kernel bug that causes
> an unmarked dereference of a faulting address, in which case we're
> likely to BUG() anyway.
>
> The real fix would be to hoist this out to the arm64 entry code (and
> handle similar for other EL1 exceptions), and get rid of all the
> __kprobes annotations inthe fault code.
This seems to be an argument for removing kprobe_page_fault() entirely,
which is fine, but while it exists it's not obvious to me how it's
supposed to interact with RV.
I suppose the pragmatic thing to do would be to align as closely as
possible with x86, but any documentation/guidance/tests to help us
maintain that would be really helpful. Otherwise, this feels like we're
going to have a repeat of the syscall entry mess where the interaction
with ptrace, audit, seccomp etc was perpetually broken in user-visible
ways.
Will
Powered by blists - more mailing lists