[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55c95c15-ccad-bb31-be87-ad17db7cb02a@fb.com>
Date: Wed, 3 Nov 2021 21:23:29 -0700
From: Yonghong Song <yhs@...com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Joe Burton <jevburton.kernel@...il.com>
CC: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Petar Penkov <ppenkov@...gle.com>,
Stanislav Fomichev <sdf@...gle.com>,
Joe Burton <jevburton@...gle.com>
Subject: Re: [RFC PATCH v3 0/3] Introduce BPF map tracing capability
On 11/3/21 10:49 AM, Alexei Starovoitov wrote:
> On Wed, Nov 3, 2021 at 10:45 AM Joe Burton <jevburton.kernel@...il.com> wrote:
>>
>> Sort of - I hit issues when defining the function in the same
>> compilation unit as the call site. For example:
>>
>> static noinline int bpf_array_map_trace_update(struct bpf_map *map,
>> void *key, void *value, u64 map_flags)
>
> Not quite :)
> You've had this issue because of 'static noinline'.
> Just 'noinline' would not have such issues even in the same file.
This seems not true. With latest trunk clang,
[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { return 1; }
int bar() { return foo() + foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o
t.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 01 00 00 00 movl $1, %eax
5: c3 retq
6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
0000000000000010 <bar>:
10: b8 02 00 00 00 movl $2, %eax
15: c3 retq
[$ ~/tmp2]
The compiler did the optimization and the original noinline function
still in the binary.
With a single foo() in bar() has the same effect.
asm("") indeed helped preserve the call.
[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { asm(""); return 1; }
int bar() { return foo() + foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o
t.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 01 00 00 00 movl $1, %eax
5: c3 retq
6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
0000000000000010 <bar>:
10: 50 pushq %rax
11: e8 00 00 00 00 callq 0x16 <bar+0x6>
16: e8 00 00 00 00 callq 0x1b <bar+0xb>
1b: b8 02 00 00 00 movl $2, %eax
20: 59 popq %rcx
21: c3 retq
[$ ~/tmp2]
Note with asm(""), foo() is called twice, but the compiler optimization
knows foo()'s return value is 1 so it did calculation at compiler time,
assign the 2 to %eax and returns.
Having a single foo() in bar() has the same effect.
[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { return 1; }
int bar() { return foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o
t.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 01 00 00 00 movl $1, %eax
5: c3 retq
6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
0000000000000010 <bar>:
10: b8 01 00 00 00 movl $1, %eax
15: c3 retq
[$ ~/tmp2]
I checked with a few llvm compiler engineers in Facebook.
They mentioned there is nothing preventing compiler from doing
optimization like poking inside the noinline function and doing
some optimization based on that knowledge.
>
> Reminder: please don't top post and trim your replies.
>
Powered by blists - more mailing lists