[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ECBABE12-F76E-4239-B9B6-DB02805DBE0D@netronome.com>
Date: Wed, 27 Mar 2019 17:06:55 +0000
From: Jiong Wang <jiong.wang@...ronome.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>, bpf@...r.kernel.org,
netdev@...r.kernel.org, oss-drivers@...ronome.com
Subject: Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that
really need zero extension to high bits
> On 27 Mar 2019, at 16:50, Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
>
> On Tue, Mar 26, 2019 at 06:05:27PM +0000, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>>
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>>
>> This is important, because for code the following:
>>
>> u64_value = (u64) u32_value
>> ... other uses of u64_value
>>
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>>
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>>
>> Algo:
>> - Record indices of instructions that do sub-register def (write). And
>> these indices need to stay with function state so path pruning and bpf
>> to bpf function call could be handled properly.
>>
>> These indices are kept up to date while doing insn walk.
>>
>> - A full register read on an active sub-register def marks the def insn as
>> needing zero extension on dst register.
>>
>> - A new sub-register write overrides the old one.
>>
>> A new full register write makes the register free of zero extension on
>> dst register.
>>
>> - When propagating register read64 during path pruning, it also marks def
>> insns whose defs are hanging active sub-register, if there is any read64
>> from shown from the equal state.
>>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@...ronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@...ronome.com>
>> ---
>> include/linux/bpf_verifier.h | 4 +++
>> kernel/bpf/verifier.c | 85 +++++++++++++++++++++++++++++++++++++++++---
>> 2 files changed, 84 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 27761ab..0ae9a3f 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -181,6 +181,9 @@ struct bpf_func_state {
>> */
>> u32 subprogno;
>>
>> + /* tracks subreg definition. */
>> + s32 subreg_def[MAX_BPF_REG];
>
> Same comment as Ed and another question: why it's not part of bpf_reg_state ?
Thanks for spotting this, indeed, it looks perfect to be merged into bpf_reg_state.
Regards,
Jiong
Powered by blists - more mailing lists