netdev - Re: [PATCH v5 bpf-next 7/7] bpf, x86_64: use bpf_prog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <F72EEF0D-4F61-4AE1-B2A1-D16A5DBCCC37@fb.com>
Date:   Fri, 21 Jan 2022 17:53:21 +0000
From:   Song Liu <songliubraving@...com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
CC:     Song Liu <song@...nel.org>, bpf <bpf@...r.kernel.org>,
        Network Development <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Alexei Starovoitov" <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        "Andrii Nakryiko" <andrii@...nel.org>,
        Kernel Team <Kernel-team@...com>,
        "Peter Zijlstra" <peterz@...radead.org>, X86 ML <x86@...nel.org>
Subject: Re: [PATCH v5 bpf-next 7/7] bpf, x86_64: use bpf_prog_pack allocator



> On Jan 20, 2022, at 8:59 PM, Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
> 
> On Thu, Jan 20, 2022 at 11:13 AM Song Liu <song@...nel.org> wrote:
>> 
>> From: Song Liu <songliubraving@...com>
>> 
>> Use bpf_prog_pack allocator in x86_64 jit.
>> 
>> The program header from bpf_prog_pack is read only during the jit process.
>> Therefore, the binary is first written to a temporary buffer, and later
>> copied to final location with text_poke_copy().
>> 
>> Similarly, jit_fill_hole() is updated to fill the hole with 0xcc using
>> text_poke_copy().
>> 
>> Signed-off-by: Song Liu <songliubraving@...com>
>> ---
>> arch/x86/net/bpf_jit_comp.c | 134 +++++++++++++++++++++++++++---------
>> 1 file changed, 103 insertions(+), 31 deletions(-)
>> 
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index fe4f08e25a1d..6d97f7c24df2 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -216,11 +216,34 @@ static u8 simple_alu_opcodes[] = {
>>        [BPF_ARSH] = 0xF8,
>> };
>> 
>> +static char jit_hole_buffer[PAGE_SIZE] = {};
> 
> Let's not waste a page filled with 0xcc.
> The pack allocator will reserve 128 bytes at the front
> and will round up the tail to 64 bytes.
> So this can be a 128 byte array?
> 
>> +
>> static void jit_fill_hole(void *area, unsigned int size)
>> +{
>> +       struct bpf_binary_header *hdr = area;
>> +       int i;
>> +
>> +       for (i = 0; i < roundup(size, PAGE_SIZE); i += PAGE_SIZE) {
> 
> multi page 0xcc-ing?
> Is it because bpf_jit_binary_alloc_pack() allocates 2MB
> and then populates the whole thing with this?
> Seems overkill.
> 0xcc in the front of the prog and in the back is there
> to catch JIT bugs.
> No need to fill 2MB with it.

I got this logic because current code memset(0xcc) for the whole 
buffer. We can change the logic to only 0xcc the first 128 bytes 
and the last 64 bytes. 

> 
> 
>> +               int s;
>> +
>> +               s = min_t(int, PAGE_SIZE, size - i);
>> +               text_poke_copy(area + i, jit_hole_buffer, s);
>> +       }
>> +
>> +       /*
>> +        * bpf_jit_binary_alloc_pack cannot write size directly to the ro
>> +        * mapping. Write it here with text_poke_copy().
>> +        */
>> +       text_poke_copy(&hdr->size, &size, sizeof(size));
>> +}

[...]

>> @@ -2248,8 +2261,10 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs)
>> 
>> struct x64_jit_data {
>>        struct bpf_binary_header *header;
>> +       struct bpf_binary_header *tmp_header;
>>        int *addrs;
>>        u8 *image;
>> +       u8 *tmp_image;
> 
> Why add these two fields here?
> With new logic header and image will be zero always?

header and image point to a section of the 2MB page; while tmp_header 
and tmp_image point to a temporary buffer from kzalloc. We need them
in x86_jit_data, so that we can reuse the temporary buffer between 
multiple calls of bpf_int_jit_compile(). It is used as:

bpf_int_jit_compile(...)
{
	/* ... */

        jit_data = prog->aux->jit_data;
        if (!jit_data) {
		/* kzalloc jit_data */	
        }
        addrs = jit_data->addrs;
        if (addrs) {
                /* reuse previous jit_data */
	}

> Maybe rename them instead?
> Or both used during JIT-ing?
> 
>>        int proglen;
>>        struct jit_context ctx;
>> };
>> @@ -2259,6 +2274,7 @@ struct x64_jit_data {
>> 

[...]

>>                        }
>> -                       prog->aux->extable = (void *) image + roundup(proglen, align);
>> +                       if (header->size > bpf_prog_pack_max_size()) {
>> +                               tmp_header = header;
>> +                               tmp_image = image;
>> +                       } else {
>> +                               tmp_header = kzalloc(header->size, GFP_KERNEL);
> 
> the header->size could be just below 2MB.
> I don't think kzalloc() can handle that.

Technically, kzalloc can handle 2MB allocation via:
  kzalloc() => kmalloc() => kmalloc_large() => kmalloc_order()

But this would fail when the memory is fragmented. I guess we should use
kvmalloc() instead?

> 
>> +                               if (!tmp_header) {
>> +                                       bpf_jit_binary_free_pack(header);
>> +                                       header = NULL;
>> +                                       prog = orig_prog;
>> +                                       goto out_addrs;
>> +                               }
>> +                               tmp_header->size = header->size;
>> +                               tmp_image = (void *)tmp_header + ((void *)image - (void *)header);
> 
> Why is 'tmp_image' needed at all?
> The above math can be done where necessary.

We pass both image and tmp_image to do_jit(), as it needs both of them. 
I think maintaining a tmp_image variable makes the logic cleaner. We can 
remove it from x64_jit_data, I guess. 

> 
>> +                       }
>> +                       prog->aux->extable = (void *)image + roundup(proglen, align);
> 
> I suspect if you didn't remove the space between (void *) and image
> the diff would be less confusing.
> This line didn't change, right?

Yeah, I forgot why I changed it in the first place. Let me undo it. 

Thanks,
Song

[...]