netdev - Re: [PATCH] bpf: optimize constant blinding

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87wohhxq1m.fsf@netronome.com>
Date:   Wed, 19 Jun 2019 21:45:25 +0100
From:   Jiong Wang <jiong.wang@...ronome.com>
To:     Edward Cree <ecree@...arflare.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Andrii Nakryiko <andriin@...com>
Cc:     Jiong Wang <jiong.wang@...ronome.com>,
        "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        bpf <bpf@...r.kernel.org>,
        Network Development <netdev@...r.kernel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>
Subject: Re: [PATCH] bpf: optimize constant blinding


Edward Cree writes:

> On 17/06/2019 21:40, Jiong Wang wrote:
>> Now if we don't split patch when patch an insn inside patch, instead, if we
>> replace the patched insn using what you suggested, then the logic looks to
>> me becomes even more complex, something like
>>
>>    for (idx = 0; idx < insn_cnt; idx++) {
>>      if (insns[idx] is not BPF_LIST_INSN) {
>>        do_insn(...)
>>      }
>>      else if (insns[idx] is BPF_LIST_INSN) {
>>        list = pool_base + insn.imm;
>>        while (list) {
>>          insn = list_head->insn;
>>          if (insn is BF_LIST_INSN) {
>>            sub_list = ...
>>            while ()
>>              do_insn()
>>            continue;
>>          }
>>          do_insn(...)
>>          list = pool_base + list->next;
>>        }
>>      }
>>    }
> Why can't do_insn() just go like:
>     if (insn is BPF_LIST_INSN)
>         for (idx = 0; idx < LIST_COUNT(insn); idx++)
>             do_insn(pool_base + LIST_START(insn) + idx);
>     else
>         rest of processing
> ?
>
> Alternatively, iterate with something more sophisticated than 'idx++'
>  (standard recursion-to-loop transformation).
> You shouldn't ever need a for() tower statically in the code...

I don't think this changes things much, the point is we still have two data
structures for insns, array + list, so I fell you anyway need some tweak on
existing traverse code while using singly linked list incurs very little
changes, for example:

  for (i = 0; i < insn_cnt; i++, insn++) {

  =>

  for (elem = list; elem; elem = elem->next) {
    insn = elem->insn;

>> So, I am thinking what Alexei and Andrii suggested make sense, just use
>> single data structure (singly linked list) to represent everything, so the
>> insn traversal etc could be simple
> But then you have to also store orig_insn_idx with each insn, so you can
>  calculate the new jump offsets when you linearise.  Having an array of
>  patched_orig_insns gives you that for free.

For pool based list, you don't need to store orig_insn_idx, those orig ones
are guaranteed at the bottom of the pool, so just use index < orig_prog_len
then you could know it is the orig insn. And for both pool and non-pool
based list, the order of orig node in the list is the same as in array, so
it quite easy to calculate the orig index as a by-product inside insn copy
traverse, for non-pool base list, each node needs at least one bit to
indicate it is orig node. I also found when patching a patched buffer which
contains jmp insn is an issue (need to double check to see if there is such
case), because then we will need to record the jump destination index of
the jmp insn when it was inserted.

And some updates on my side, did some benchmarking on both pool and
non-pool based list.

Patching time (ns) benchmarked using JIT blinding
===

                    existing    non-pool      pool

"scale test 1"      no stop    ~0x1c600000  ~0x8800000
Bench(~3.4K insns)  ~0xc50000  ~0xf1000     ~6b000

(The non-pool means kmalloc a list node for each patch snippet, pool means
vmalloc a big chunk of mem and allocate node from it, node is located using
pool_base + index)

For "scale test 1" which contains ~1M JIT blindable insns, using list based
infra for patching could reduce most of the patching time, and pool based
alloc only consumes 1/3 time of non-pool.

And for a normal program with reasonable size (~3.4K), pool based alloc
only consumes 1/30 time of exsisting infra, and 1/2.3 of non-pool.

On the other hand, non-pool based implementation is cleaner and less error
prone than pool based implementation.

And for both pool and non-pool, I got the following kernel warning when
running "scale test 11" (perhaps needs 3M * 16 ~= 48M mem)

[   92.319792] WARNING: CPU: 6 PID: 2322 at mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330

I am finishing aux adjust and code delete support.

Regards,
Jiong