[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c5fb22629d3f42798def5b63ce834801@AcuMS.aculab.com>
Date: Mon, 16 Dec 2024 09:18:56 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Jiri Olsa' <olsajiri@...il.com>, Oleg Nesterov <oleg@...hat.com>
CC: Peter Zijlstra <peterz@...radead.org>, Andrii Nakryiko
<andrii@...nel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>, Song Liu
<songliubraving@...com>, Yonghong Song <yhs@...com>, John Fastabend
<john.fastabend@...il.com>, Hao Luo <haoluo@...gle.com>, Steven Rostedt
<rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>, Alan Maguire
<alan.maguire@...cle.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-trace-kernel@...r.kernel.org"
<linux-trace-kernel@...r.kernel.org>
Subject: RE: [PATCH bpf-next 08/13] uprobes/x86: Add support to optimize
uprobes
From: Jiri Olsa
> Sent: 16 December 2024 08:09
>
> On Sun, Dec 15, 2024 at 03:14:13PM +0100, Oleg Nesterov wrote:
> > On 12/15, David Laight wrote:
> > >
> > > From: Jiri Olsa
> > > > The optimized uprobe path
> > > >
> > > > - checks the original instruction is 5-byte nop (plus other checks)
> > > > - adds (or uses existing) user space trampoline and overwrites original
> > > > instruction (5-byte nop) with call to user space trampoline
> > > > - the user space trampoline executes uprobe syscall that calls related uprobe
> > > > consumers
> > > > - trampoline returns back to next instruction
> > > ...
> > >
> > > How on earth can you safely overwrite a randomly aligned 5 byte instruction
> > > that might be being prefetched and executed by another thread of the
> > > same process.
> >
> > uprobe_write_opcode() doesn't overwrite the instruction in place.
> >
> > It creates the new page with the same content, overwrites the probed insn in
> > that page, then calls __replace_page().
>
> tbh I wasn't completely sure about that as well, I added selftest
> in patch #11 trying to hit the issue you described and it seems to
> work ok
Actually hitting the timing window is hard.
So 'seems to work ok' doesn't really mean much :-)
It all depends on how hard __replace_page() tries to be atomic.
The page has to change from one backed by the executable to a private
one backed by swap - otherwise you can't write to it.
But the problems arise when the instruction prefetch unit has read
part of the 5-byte instruction (it might even only read half a cache
line at a time).
I'm not sure how long the pipeline can sit in that state - but I
can do a memory read of a PCIe address that takes ~3000 clocks.
(And a misaligned AVX-512 read is probably eight 8-byte transfers.)
So I think you need to force an interrupt while the PTE is invalid.
And that need to be simultaneous on all cpu running that process.
Stopping the process using ptrace would do it.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists