[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzaGLJo2y2vUSQaCB7DZtVP3Q89TzbXO0UFvQvUw+Q2kng@mail.gmail.com>
Date: Fri, 14 Jun 2019 15:28:29 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Jiong Wang <jiong.wang@...ronome.com>,
"Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
Daniel Borkmann <daniel@...earbox.net>,
bpf <bpf@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Jakub Kicinski <jakub.kicinski@...ronome.com>
Subject: Re: [PATCH] bpf: optimize constant blinding
On Fri, Jun 14, 2019 at 10:06 AM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Fri, Jun 14, 2019 at 8:13 AM Jiong Wang <jiong.wang@...ronome.com> wrote:
> >
> >
> > Alexei Starovoitov writes:
> >
> > > On Wed, Jun 12, 2019 at 8:25 AM Jiong Wang <jiong.wang@...ronome.com> wrote:
> > >>
> > >>
> > >> Jiong Wang writes:
> > >>
> > >> > Alexei Starovoitov writes:
> > >> >
> > >> >> On Wed, Jun 12, 2019 at 4:32 AM Naveen N. Rao
> > >> >> <naveen.n.rao@...ux.vnet.ibm.com> wrote:
> > >> >>>
> > >> >>> Currently, for constant blinding, we re-allocate the bpf program to
> > >> >>> account for its new size and adjust all branches to accommodate the
> > >> >>> same, for each BPF instruction that needs constant blinding. This is
> > >> >>> inefficient and can lead to soft lockup with sufficiently large
> > >> >>> programs, such as the new verifier scalability test (ld_dw: xor
> > >> >>> semi-random 64 bit imms, test 5 -- with net.core.bpf_jit_harden=2)
> > >> >>
> > >> >> Slowdown you see is due to patch_insn right?
> > >> >> In such case I prefer to fix the scaling issue of patch_insn instead.
> > >> >> This specific fix for blinding only is not addressing the core of the problem.
> > >> >> Jiong,
> > >> >> how is the progress on fixing patch_insn?
> > >>
> > >> And what I have done is I have digested your conversion with Edward, and is
> > >> slightly incline to the BB based approach as it also exposes the inserted
> > >> insn to later pass in a natural way, then was trying to find a way that
> > >> could create BB info in little extra code based on current verifier code,
> > >> for example as a side effect of check_subprogs which is doing two insn
> > >> traversal already. (I had some such code before in the historical
> > >> wip/bpf-loop-detection branch, but feel it might be still too heavy for
> > >> just improving insn patching)
> > >
> > > BB - basic block?
> > > I'm not sure that was necessary.
> > > The idea was that patching is adding stuff to linked list instead
> > > and single pass at the end to linearize it.
> >
> > Just an update and keep people posted.
> >
> > Working on linked list based approach, the implementation looks like the
> > following, mostly a combine of discussions happened and Naveen's patch,
> > please feel free to comment.
> >
> > - Use the reserved opcode 0xf0 with BPF_ALU as new pseudo insn code
> > BPF_LIST_INSN. (0xf0 is also used with BPF_JMP class for tail call).
> >
> > - Introduce patch pool into bpf_prog->aux to keep all patched insns.
> > Pool structure looks like:
> >
> > struct {
> > int num;
> > int prev;
> > int next;
> > } head_0;
> > NUM patched insns for head_0
> > head_1;
> > patched insns for head_1
> > head_2;
> > ...
> >
> > - Now when doing bpf_patch_insn_single, it doesn't change the original
> > prog etc, instead, it merely update the insn at patched offset into a
> > BPF_LIST_INSN, and pushed the patched insns plus a patch header into
> > the patch pool. Fields of BPF_LIST_INSN is updated to setup the links:
> >
> > BPF_LIST_INSN.off += patched_size
> > (accumulating the size attached to this list_insn, it is possible a
> > later patch pass patches insn in the patch pool, this means insn
> > traversal needs to be changed, when seeing BPF_LIST_INSN, should go
> > through the list)
> >
> > BPF_LIST_INSN.imm = offset of the patch header in patch pool
> > (off is 16-bit, imm is 32-bit, the patch pool is 32-bit length, so
> > use imm for keeping offset, meaning a BPF_LIST_INSN can contains no
> > more than 8192 insns, guess it is enough)
> >
> > - When doing linearize:
> > 1. a quick scan of prog->insnsi to know the final
> > image size, would be simple as:
> >
> > fini_size = 0;
> > for_each_insn:
> > if (insn.code == (BPF_ALU | BPF_LIST_HEAD))
> > fini_size += insn->off;
> > else
> > fini_size++;
> >
> > 2. Resize prog into fini_size, and a second scan of prog->insnsi to
> > copy over all insns and patched insns, at the same time generate a
> > auxiliary index array which maps an old index to the new index in
> > final image, like the "clone_index" in Naveen's patch.
> >
> > 3. Finally, a single pass to update branch target, the same algo used
> > by this patch.
> >
> > - The APIs for doing insning patch looks like:
> > bpf_patch_insn_init: init the generic patch pool.
> > bpf_patch_insn_single: push patched insns to the pool.
> > link them to the associated BPF_LIST_INSN.
> > bpf_patch_insn_fini: linearize a bpf_prog contains BPF_LIST_INSN.
> > destroy patch pool in prog->aux.
> >
> > I am trying to making the implementation working with jit blind first to make
> > sure basic things are ready. As JIT blinds happens after verification so no
> > need to both aux update etc. Then will cleanup quite a few things for
> > example patch a patched insn, adjust aux data, what to do with insn delete
> > etc.
>
> explicit indices feels like premature optimization.
> May be use vanilla singly linked list instead?
> Also do we have a case when patched insn will be patched again?
> In such case 'patch insn pool' will become recursive?
> Feels difficult to think through all offsets and indices.
> Whereas with linked list patching patched insns will be inserting
> them into link list.
>
> May be better alternative is to convert the whole program to link list
> of insns with branch targets becoming pointers and insert patched
> insns into this single singly linked list ?
I think converting to a singly-linked list is a good, simple and
straightforward approach. The only downside is 3x more memory (insn +
next pointer + branch pointer for instructions that branch/call), but
it shouldn't be a big deal in practice. But it seems like a good
opportunity to simplify and clean up patching code, so I'd definitely
vote to start with this approach.
Powered by blists - more mailing lists