[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.23.451.2201272217470.8195@MyRouter>
Date: Thu, 27 Jan 2022 22:54:44 +0000 (GMT)
From: Alan Maguire <alan.maguire@...cle.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
cc: Alan Maguire <alan.maguire@...cle.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
Yucong Sun <sunyucong@...il.com>,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>
Subject: Re: [RFC bpf-next 0/3] libbpf: name-based u[ret]probe attach
On Mon, 24 Jan 2022, Andrii Nakryiko wrote:
> On Mon, Jan 24, 2022 at 6:14 AM Alan Maguire <alan.maguire@...cle.com> wrote:
> >
> > I think for users it'd be good to clarify what the overheads are. If I
> > want to see malloc()s in a particular process, say I specify the libc
> > path along with the process ID I'm interested in. This adds the
> > breakpoint to libc and will - as far as I understand it - trigger
> > breakpoints system-wide which are then filtered out by uprobe perf handling
> > for the specific process specified. That's pretty expensive
> > performance-wise, so we should probably try and give users options to
> > limit the cost in cases where they don't want to incur system-wide
> > overheads. I've been investigating adding support for instrumenting shared
> > library calls _within_ programs by placing the breakpoint on the procedure
> > linking table call associated with the function. As this is local to the
>
> You mean to patch PLT stubs ([0])?
Yep, the .plt table, as shown by "objdump -D -j .plt <program>"
Disassembly of section .plt:
000000000040d020 <.plt>:
40d020: ff 35 e2 5f 4b 00 pushq 0x4b5fe2(%rip) #
8c3008 <
_GLOBAL_OFFSET_TABLE_+0x8>
40d026: ff 25 e4 5f 4b 00 jmpq *0x4b5fe4(%rip) #
8c3010
<_GLOBAL_OFFSET_TABLE_+0x10>
40d02c: 0f 1f 40 00 nopl 0x0(%rax)
000000000040d030 <inet_ntop@plt>:
40d030: ff 25 e2 5f 4b 00 jmpq *0x4b5fe2(%rip) #
8c3018
<inet_ntop@...BC_2.2.5>
40d036: 68 00 00 00 00 pushq $0x0
40d03b: e9 e0 ff ff ff jmpq 40d020 <.plt>
In the case of inet_ntop() the address would be 40d030 - which we
then do the relative address calcuation on, giving the address to
be uprobe'd as 0xd030.
> One concern with that is (besides
> making sure that pt_regs still have exactly the same semantics and
> stuff) that uprobes are much faster when patching nop instructions (if
> the library was compiled with nop "preambles". Do you know if @plt
> entries can be compiled with nops as well?
I haven't found any way to do that unfortunately.
> The difference in
> performance is more than 2x from my non-scientific testing recently.
> So it can be a pretty big difference.
>
Interesting! There may be a cleaner way to achieve the goal of
tracing shared library calls in the local binary, but I'm not seeing
an alternate approach yet unfortunately. To me the key thing is
to ensure we have an alternative to globally tracing in libc. I'll send
out the v2 addressing the things you found in the RFC shortly (and that
uses the .plt instrumentation approach). Thanks!
Alan
Powered by blists - more mailing lists