[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190406034110.o2qzjyl57ypzal3z@ast-mbp.dhcp.thefacebook.com>
Date: Fri, 5 Apr 2019 20:41:12 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Jiong Wang <jiong.wang@...ronome.com>
Cc: Edward Cree <ecree@...arflare.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, bpf@...r.kernel.org,
netdev@...r.kernel.org, oss-drivers@...ronome.com
Subject: Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that
really need zero extension to high bits
On Fri, Apr 05, 2019 at 09:44:49PM +0100, Jiong Wang wrote:
>
> > On 26 Mar 2019, at 18:44, Edward Cree <ecree@...arflare.com> wrote:
> >
> > On 26/03/2019 18:05, Jiong Wang wrote:
> >> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> >> sub-register is written. This applies to destination register of ALU32 etc.
> >> JIT back-ends must guarantee this semantic when doing code-gen.
> >>
> >> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> >> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> >> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> >> extension sequence to meet such semantic.
> >>
> >> This is important, because for code the following:
> >>
> >> u64_value = (u64) u32_value
> >> ... other uses of u64_value
> >>
> >> compiler could exploit the semantic described above and save those zero
> >> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> >> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> >> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> >> could go up to more for some arches which requires two shifts for zero
> >> extension) because JIT back-end needs to do extra code-gen for all such
> >> instructions.
> >>
> >> However this is not always necessary in case u32_value is never cast into
> >> a u64, which is quite normal in real life program. So, it would be really
> >> good if we could identify those places where such type cast happened, and
> >> only do zero extensions for them, not for the others. This could save a lot
> >> of BPF code-gen.
> >>
> >> Algo:
> >> - Record indices of instructions that do sub-register def (write). And
> >> these indices need to stay with function state so path pruning and bpf
> >> to bpf function call could be handled properly.
> >>
> >> These indices are kept up to date while doing insn walk.
> >>
> >> - A full register read on an active sub-register def marks the def insn as
> >> needing zero extension on dst register.
> >>
> >> - A new sub-register write overrides the old one.
> >>
> >> A new full register write makes the register free of zero extension on
> >> dst register.
> >>
> >> - When propagating register read64 during path pruning, it also marks def
> >> insns whose defs are hanging active sub-register, if there is any read64
> >> from shown from the equal state.
> >>
> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@...ronome.com>
> >> Signed-off-by: Jiong Wang <jiong.wang@...ronome.com>
> >> ---
> >> include/linux/bpf_verifier.h | 4 +++
> >> kernel/bpf/verifier.c | 85 +++++++++++++++++++++++++++++++++++++++++---
> >> 2 files changed, 84 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index 27761ab..0ae9a3f 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -181,6 +181,9 @@ struct bpf_func_state {
> >> */
> >> u32 subprogno;
> >>
> >> + /* tracks subreg definition. */
> > Ideally this comment should mention that the stored value is the insn_idx
> > of the writing insn. Perhaps also that this is safe because patching
> > (bpf_patch_insn_data()) only happens after main verification completes.
>
> During full x86_64 host tests, found one new issue.
>
> “convert_ctx_accesses” will change load size, A BPF_W load could be transformed
> into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And
> “convert_ctx_accesses” happens after zero extension insertion.
>
> So, a BPF_W load could have been marked and zero extensions inserted after
> it, however, the later happened “convert_ctx_accesses” then figured out it’s
> transformed load size is actually BPF_DW then re-write to that. But the
> previously inserted zero extensions then break things, the high 32 bits are
> wrongly cleared. For example:
>
> 1: r2 = *(u32 *)(r1 + 80)
> 2: r1 = *(u32 *)(r1 + 76)
> 3: r3 = r1
> 4: r3 += 14
> 5: if r3 > r2 goto +35
>
> insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
> and “data_end". There shouldn’t be zero-extension inserted after them will
> will destroy the pointer. However they are treated as 32-bit load initially,
> and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
> extension.
>
> I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
> the same those in real underlying context, only when one field is pointer
> type, then it could be possible be a u32 to u64 conversion. So, I guess
> we just need to mark the dst register as a full 64-bit register write
> inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
> returned by check_ctx_access is ptr type.
Since the register containing ctx->data was used later in the load insn and
it's type was pointer the analysis should have marked it as 64-bit access.
It feels that there is an issue in propagating 64-bit access through
parentage chain. Since insn 5 above recognized r2 as 64-bit access
then how come insn 1 was still allowed to poison upper bits?
Powered by blists - more mailing lists